Skip to main content

Monitoring Grafana 12 with Prometheus Metrics

In this guide, we explore the extensive set of Prometheus metrics exposed by Grafana 12.0.2. These metrics provide deep insights into the performance, health, and behavior of your Grafana instance, enabling effective monitoring and troubleshooting.

By leveraging these metrics, you can track critical aspects of your Grafana environment, including alerting, API usage, database performance, and user interactions. Whether you're a system administrator, DevOps engineer, or data enthusiast, understanding and utilizing these metrics will help you optimize your Grafana setup and ensure its reliability.

This article provides a comprehensive list of Prometheus metrics available in Grafana 12, along with their types and detailed descriptions. We also offer guidance on integrating these metrics into your monitoring strategy to build powerful dashboards and alerts.

Why Monitor Grafana with Prometheus?​

Monitoring Grafana using Prometheus metrics allows you to:

  • Detect Issues Early: Identify performance bottlenecks or system failures before they impact users.
  • Optimize Resources: Understand resource usage patterns to allocate CPU, memory, and database capacity effectively.
  • Improve User Experience: Monitor API response times and user interactions to ensure a seamless experience.
  • Enhance Alerting: Track alerting system health to ensure timely notifications for critical events.

Prometheus Metrics in Grafana 12​

Below is a detailed list of Prometheus metrics exposed by Grafana 12.0.2.

Metric NameTypeDescription
grafana_access_evaluation_countCounterNumber of evaluation calls.
grafana_access_evaluation_durationHistogramHistogram for the runtime of evaluation function.
grafana_access_permissions_cache_usageCounterAccess control permissions cache hit/miss.
grafana_access_permissions_durationHistogramHistogram for the runtime of permissions check function.
grafana_access_search_permissions_durationHistogramHistogram for the runtime of permissions search function.
grafana_access_search_user_permissions_cache_usageCounterAccess control search user permissions cache hit/miss.
grafana_aggregator_discovery_aggregation_count_totalCounterCounter of number of times discovery was aggregated.
grafana_alerting_active_alertsGaugeAmount of active alerts.
grafana_alerting_active_configurationsGaugeThe number of active Alertmanager configurations.
grafana_alerting_alertmanager_alertsGaugeHow many alerts by state are in Grafana's Alertmanager.
grafana_alerting_alertmanager_config_hashGaugeThe hash of the Alertmanager configuration.
grafana_alerting_alertmanager_config_matchGaugeThe total number of match.
grafana_alerting_alertmanager_config_match_reGaugeThe total number of matchRE.
grafana_alerting_alertmanager_config_matchersGaugeThe total number of matchers.
grafana_alerting_alertmanager_config_object_matchersGaugeThe total number of object_matchers.
grafana_alerting_alertmanager_config_size_bytesGaugeThe size of the Grafana Alertmanager configuration in bytes.
grafana_alerting_alertmanager_inhibition_rulesGaugeNumber of configured inhibition rules.
grafana_alerting_alertmanager_integrationsGaugeNumber of configured receivers.
grafana_alerting_alertmanager_receiversGaugeNumber of configured receivers by state. It is considered active if used within a route.
grafana_alerting_alertsGaugeHow many alerts by state are in the scheduler.
grafana_alerting_alerts_invalid_totalCounterThe total number of received alerts that were invalid.
grafana_alerting_alerts_received_totalCounterThe total number of received alerts.
grafana_alerting_discovered_configurationsGaugeThe number of organizations we've discovered that require an Alertmanager configuration.
grafana_alerting_dispatcher_aggregation_groupsGaugeNumber of active aggregation groups.
grafana_alerting_dispatcher_alert_processing_duration_secondsSummarySummary of latencies for the processing of alerts.
grafana_alerting_execution_time_millisecondsSummarySummary of alert execution duration.
grafana_alerting_nflog_gc_duration_secondsSummaryDuration of the last notification log garbage collection cycle.
grafana_alerting_nflog_gossip_messages_propagated_totalCounterNumber of received gossip messages that have been further gossiped.
grafana_alerting_nflog_queries_totalCounterNumber of notification log queries were received.
grafana_alerting_nflog_query_duration_secondsHistogramDuration of notification log query evaluation.
grafana_alerting_nflog_query_errors_totalCounterNumber of notification log received queries that failed.
grafana_alerting_nflog_snapshot_duration_secondsSummaryDuration of the last notification log snapshot.
grafana_alerting_nflog_snapshot_size_bytesGaugeSize of the last notification log snapshot in bytes.
grafana_alerting_notification_latency_secondsHistogramThe latency of notifications in seconds.
grafana_alerting_remote_alertmanager_configuration_sync_failures_totalCounterTotal number of failed attempts to sync configurations between Alertmanagers.
grafana_alerting_remote_alertmanager_configuration_syncs_totalCounterTotal number of configuration syncs to the remote Alertmanager.
grafana_alerting_remote_alertmanager_last_configuration_sync_timestamp_secondsGaugeTimestamp of the last successful configuration sync to the remote Alertmanager in seconds.
grafana_alerting_remote_alertmanager_last_readiness_check_timestamp_secondsGaugeTimestamp of the last successful readiness check to the remote Alertmanager in seconds.
grafana_alerting_remote_alertmanager_last_state_sync_timestamp_secondsGaugeTimestamp of the last successful state sync to the remote Alertmanager in seconds.
grafana_alerting_remote_alertmanager_state_sync_failures_totalCounterTotal number of failed attempts to sync state between Alertmanagers.
grafana_alerting_remote_alertmanager_state_syncs_totalCounterTotal number of state syncs to the remote Alertmanager.
grafana_alerting_request_duration_secondsHistogramHistogram of requests to the Alerting API.
grafana_alerting_schedule_alert_rulesGaugeThe number of alert rules that could be considered for evaluation at the next tick.
grafana_alerting_schedule_alert_rules_hashGaugeA hash of the alert rules that could be considered for evaluation at the next tick.
grafana_alerting_schedule_periodic_duration_secondsHistogramThe time taken to run the scheduler.
grafana_alerting_schedule_query_alert_rules_duration_secondsHistogramThe time taken to fetch alert rules from the database.
grafana_alerting_scheduler_behind_secondsGaugeThe total number of seconds the scheduler is behind.
grafana_alerting_silencesGaugeHow many silences by state.
grafana_alerting_silences_gc_duration_secondsSummaryDuration of the last silence garbage collection cycle.
grafana_alerting_silences_gossip_messages_propagated_totalCounterNumber of received gossip messages that have been further gossiped.
grafana_alerting_silences_queries_totalCounterHow many silence queries were received.
grafana_alerting_silences_query_duration_secondsHistogramDuration of silence query evaluation.
grafana_alerting_silences_query_errors_totalCounterHow many silence received queries did not succeed.
grafana_alerting_silences_snapshot_duration_secondsSummaryDuration of the last silence snapshot.
grafana_alerting_silences_snapshot_size_bytesGaugeSize of the last silence snapshot in bytes.
grafana_alerting_state_calculation_duration_secondsHistogramThe duration of calculation of a single state.
grafana_alerting_state_full_sync_duration_secondsHistogramThe duration of fully synchronizing the state with the database.
grafana_alerting_state_history_infoGaugeInformation about the state history store.
grafana_alerting_state_history_writes_bytes_totalCounterThe total number of bytes sent within a batch to the state history store.
grafana_alerting_ticker_interval_secondsGaugeInterval at which the ticker is meant to tick.
grafana_alerting_ticker_last_consumed_tick_timestamp_secondsGaugeTimestamp of the last consumed tick in seconds.
grafana_alerting_ticker_next_tick_timestamp_secondsGaugeTimestamp of the next tick in seconds before it is consumed.
grafana_api_admin_user_created_totalCounterAPI admin user created counter.
grafana_api_dashboard_get_millisecondsSummarySummary for dashboard get duration.
grafana_api_dashboard_save_millisecondsSummarySummary for dashboard save duration.
grafana_api_dashboard_search_millisecondsSummarySummary for dashboard search duration.
grafana_api_dashboard_snapshot_create_totalCounterDashboard snapshots created.
grafana_api_dashboard_snapshot_external_totalCounterExternal dashboard snapshots created.
grafana_api_dashboard_snapshot_get_totalCounterLoaded dashboards.
grafana_api_dataproxy_request_all_millisecondsSummarySummary for dataproxy request duration.
grafana_api_login_oauth_totalCounterAPI login OAuth counter.
grafana_api_login_post_totalCounterAPI login post counter.
grafana_api_login_saml_totalCounterAPI login SAML counter.
grafana_api_models_dashboard_insert_totalCounterDashboards inserted.
grafana_api_org_create_totalCounterAPI org created counter.
grafana_api_response_status_totalCounterAPI HTTP response status.
grafana_api_user_signup_completed_totalCounterAmount of users who completed the signup flow.
grafana_api_user_signup_invite_totalCounterAmount of users who have been invited.
grafana_api_user_signup_started_totalCounterAmount of users who started the signup flow.
grafana_apiserver_audit_event_totalCounterCounter of audit events generated and sent to the audit backend.
grafana_apiserver_audit_requests_rejected_totalCounterCounter of apiserver requests rejected due to an error in audit logging backend.
grafana_apiserver_client_certificate_expiration_secondsHistogramDistribution of the remaining lifetime on the certificate used to authenticate a request.
grafana_apiserver_current_inflight_requestsGaugeMaximal number of currently used inflight request limit of this apiserver per request kind in last second.
grafana_apiserver_envelope_encryption_dek_cache_fill_percentGaugePercent of the cache slots currently occupied by cached DEKs.
grafana_apiserver_flowcontrol_read_vs_write_current_requestsHistogramObservations of the number of requests waiting or in regular stage of execution.
grafana_apiserver_flowcontrol_seat_fair_fracGaugeFair fraction of server's concurrency to allocate to each priority level that can use it.
grafana_apiserver_kube_aggregator_x509_insecure_sha1_totalCounterCounts the number of requests to servers with insecure SHA1 signatures in their serving certificate.
grafana_apiserver_kube_aggregator_x509_missing_san_totalCounterCounts the number of requests to servers missing SAN extension in their serving certificate.
grafana_apiserver_request_body_size_bytesHistogramApiserver request body size in bytes broken out by resource and verb.
grafana_apiserver_request_duration_secondsHistogramResponse latency distribution in seconds for each verb, dry run value, group, version, resource, etc.
grafana_apiserver_request_filter_duration_secondsHistogramRequest filter latency distribution in seconds, for each filter type.
grafana_apiserver_request_sli_duration_secondsHistogramResponse latency distribution (not counting webhook duration and priority & fairness queue wait times).
grafana_apiserver_request_slo_duration_secondsHistogramResponse latency distribution (not counting webhook duration and priority & fairness queue wait times).
grafana_apiserver_request_timestamp_comparison_timeHistogramTime taken for comparison of old vs new objects in UPDATE or PATCH requests.
grafana_apiserver_request_totalCounterCounter of apiserver requests broken out for each verb, dry run value, group, version, resource, etc.
grafana_apiserver_response_sizesHistogramResponse size distribution in bytes for each group, version, verb, resource, subresource, scope, etc.
grafana_apiserver_selfrequest_totalCounterCounter of apiserver self-requests broken out for each verb, API resource, and subresource.
grafana_apiserver_storage_data_key_generation_duration_secondsHistogramLatencies in seconds of data encryption key (DEK) generation operations.
grafana_apiserver_storage_data_key_generation_failures_totalCounterTotal number of failed data encryption key (DEK) generation operations.
grafana_apiserver_storage_envelope_transformation_cache_misses_totalCounterTotal number of cache misses while accessing key decryption key (KEK).
grafana_apiserver_storage_objectsGaugeNumber of stored objects at the time of last check split by kind.
grafana_apiserver_tls_handshake_errors_totalCounterNumber of requests dropped with 'TLS handshake error from' error.
grafana_apiserver_webhooks_x509_insecure_sha1_totalCounterCounts the number of requests to servers with insecure SHA1 signatures in their serving certificate.
grafana_apiserver_webhooks_x509_missing_san_totalCounterCounts the number of requests to servers missing SAN extension in their serving certificate.
grafana_authenticated_user_requestsCounterCounter of authenticated requests broken out by username.
grafana_authentication_attemptsCounterCounter of authenticated attempts.
grafana_authentication_duration_secondsHistogramAuthentication duration in seconds broken out by result.
grafana_authn_authn_failed_authentication_totalCounterNumber of failed authentications.
grafana_authn_authn_successful_authentication_totalCounterNumber of successful authentications.
grafana_authn_authn_successful_login_totalCounterNumber of successful logins.
grafana_authorization_attempts_totalCounterCounter of authorization attempts broken down by result.
grafana_authorization_duration_secondsHistogramAuthorization duration in seconds broken out by result.
grafana_build_infoGaugeA metric with a constant '1' value labeled by version, revision, branch, and goversion from which Grafana was built.
grafana_build_timestampGaugeA metric exposing when the binary was built in epoch.
grafana_cardinality_enforcement_unexpected_categorizations_totalCounterThe count of unexpected categorizations during cardinality enforcement.
grafana_database_all_migrations_duration_secondsHistogramDuration of the entire SQL migration process in seconds.
grafana_database_conn_idleGaugeThe number of idle connections.
grafana_database_conn_in_useGaugeThe number of connections currently in use.
grafana_database_conn_max_idle_closed_secondsCounterThe total number of connections closed due to SetConnMaxIdleTime.
grafana_database_conn_max_idle_closed_totalCounterThe total number of connections closed due to SetMaxIdleConns.
grafana_database_conn_max_lifetime_closed_totalCounterThe total number of connections closed due to SetConnMaxLifetime.
grafana_database_conn_max_openGaugeMaximum number of open connections to the database.
grafana_database_conn_openGaugeThe number of established connections both in use and idle.
grafana_database_conn_wait_count_totalCounterThe total number of connections waited for.
grafana_database_conn_wait_duration_secondsCounterThe total time blocked waiting for a new connection.
grafana_datasource_request_duration_secondsHistogramHistogram of durations of outgoing data source requests sent from Grafana.
grafana_datasource_request_in_flightGaugeA gauge of outgoing data source requests currently being sent by Grafana.
grafana_datasource_request_totalCounterA counter for outgoing requests for a data source.
grafana_datasource_response_size_bytesHistogramHistogram of data source response sizes returned to Grafana.
grafana_db_datasource_query_by_id_totalCounterCounter for getting datasource by id.
grafana_disabled_metrics_totalCounterThe count of disabled metrics.
grafana_emails_sent_failedCounterNumber of emails Grafana failed to send.
grafana_emails_sent_totalCounterNumber of emails sent by Grafana.
grafana_encryption_cache_reads_totalCounterA counter for encryption cache reads.
grafana_encryption_ops_totalCounterA counter for encryption operations.
grafana_environment_infoGaugeA metric with a constant '1' value labeled by environment information about the running instance.
grafana_feature_toggles_infoGaugeInfo metric that exposes what feature toggles are enabled or not.
grafana_field_validation_request_duration_secondsHistogramResponse latency distribution in seconds for each field validation value.
grafana_folder_id_api_countCounterCounter for folder id usage in API package.
grafana_folder_id_service_countCounterCounter for folder id usage in service package.
grafana_folders_get_children_duration_secondsHistogramDuration of listing subfolders in specific folder.
grafana_frontend_boot_css_time_secondsHistogramFrontend boot initial CSS load.
grafana_frontend_boot_first_contentful_paint_time_secondsHistogramFrontend boot first contentful paint.
grafana_frontend_boot_first_paint_time_secondsHistogramFrontend boot first paint.
grafana_frontend_boot_js_done_time_secondsHistogramFrontend boot initial JS load.
grafana_frontend_boot_load_time_secondsHistogramFrontend boot time measurement.
grafana_frontend_plugins_preload_msHistogramFrontend preload plugin time measurement.
grafana_hidden_metrics_totalCounterThe count of hidden metrics.
grafana_http_request_duration_secondsHistogramHistogram of latencies for HTTP requests.
grafana_http_request_in_flightGaugeA gauge of requests currently being served by Grafana.
grafana_iam_authz_direct_db_service_invalid_request_countCounterAuthZ service invalid request count.
grafana_iam_authz_direct_db_service_permission_cache_usageCounterAuthZ service permission cache usage.
grafana_idforwarding_idforwarding_failed_token_signing_totalCounterNumber of failed token signings.
grafana_idforwarding_idforwarding_token_signing_duration_secondsHistogramHistogram of token signing duration.
grafana_idforwarding_idforwarding_token_signing_from_cache_totalCounterNumber of signed tokens retrieved from cache.
grafana_idforwarding_idforwarding_token_signing_totalCounterNumber of token signings.
grafana_index_server_index_sizeGaugeSize of the index in bytes - only for file-based indices.
grafana_instance_start_totalCounterCounter for started instances.
grafana_ldap_users_sync_execution_timeSummarySummary for LDAP users sync execution duration.
grafana_live_broker_redis_pub_sub_dropped_messagesCounterNumber of dropped messages on application level in Redis PUB/SUB.
grafana_live_client_command_duration_secondsSummaryClient command duration summary.
grafana_live_client_connections_inflightGaugeNumber of inflight client connections.
grafana_live_client_ping_pong_duration_secondsHistogramPing/Pong duration in seconds.
grafana_live_client_subscriptions_inflightGaugeNumber of inflight client subscriptions.
grafana_live_node_action_countCounterNumber of various actions called.
grafana_live_node_broadcast_duration_secondsHistogramBroadcast duration in seconds.
grafana_live_node_buildGaugeNode build info.
grafana_live_node_messages_received_countCounterNumber of messages received from broker.
grafana_live_node_messages_sent_countCounterNumber of messages sent by node to broker.
grafana_live_node_num_channelsGaugeNumber of channels with one or more subscribers.
grafana_live_node_num_clientsGaugeNumber of clients connected.
grafana_live_node_num_nodesGaugeNumber of nodes in the cluster.
grafana_live_node_num_subscriptionsGaugeNumber of subscriptions.
grafana_live_node_num_usersGaugeNumber of unique users connected.
grafana_live_node_pub_sub_lag_secondsHistogramPub sub lag in seconds.
grafana_live_transport_messages_receivedCounterNumber of messages received from client connections over specific transport.
grafana_live_transport_messages_received_sizeCounterSize in bytes of messages received from client connections over specific transport.
grafana_live_transport_messages_sentCounterNumber of messages sent to client connections over specific transport.
grafana_live_transport_messages_sent_sizeCounterSize in bytes of messages sent to client connections over specific transport.
grafana_page_response_status_totalCounterPage HTTP response status.
grafana_plugin_build_infoGaugeA metric with a constant '1' value labeled by pluginId, pluginType, and version from which Grafana plugin was built.
grafana_plugin_request_duration_millisecondsHistogramPlugin request duration.
grafana_plugin_request_duration_secondsHistogramPlugin request duration in seconds.
grafana_plugin_request_size_bytesHistogramHistogram of plugin request sizes returned.
grafana_plugin_request_totalCounterThe total amount of plugin requests.
grafana_plugin_target_infoGaugeA metric with a constant '1' value labeled by pluginId and target.
grafana_plugins_preinstall_duration_secondsHistogramPlugin preinstallation duration.
grafana_plugins_preinstall_totalCounterThe total amount of plugin preinstallations.
grafana_process_cpu_seconds_totalCounterTotal user and system CPU time spent in seconds.
grafana_process_max_fdsGaugeMaximum number of open file descriptors.
grafana_process_network_receive_bytes_totalCounterNumber of bytes received by the process over the network.
grafana_process_network_transmit_bytes_totalCounterNumber of bytes sent by the process over the network.
grafana_process_open_fdsGaugeNumber of open file descriptors.
grafana_process_resident_memory_bytesGaugeResident memory size in bytes.
grafana_process_start_time_secondsGaugeStart time of the process since unix epoch in seconds.
grafana_process_virtual_memory_bytesGaugeVirtual memory size in bytes.
grafana_process_virtual_memory_max_bytesGaugeMaximum amount of virtual memory available in bytes.
grafana_prometheus_plugin_backend_request_countCounterThe total amount of Prometheus backend plugin requests.
grafana_proxy_response_status_totalCounterProxy HTTP response status.
grafana_public_dashboard_request_countCounterCounter for public dashboards requests.
grafana_registered_metrics_totalCounterThe count of registered metrics broken by stability level and deprecation version.
grafana_rendering_queue_sizeGaugeSize of rendering queue.
grafana_search_dashboard_search_failures_duration_secondsHistogramDuration of dashboard search failures.
grafana_search_dashboard_search_successes_duration_secondsHistogramDuration of dashboard search successes.
grafana_stat_active_usersGaugeNumber of active users.
grafana_stat_failed_migrated_api_keys_to_sa_tokensGaugeTotal number of failed migrations of API keys to service account tokens.
grafana_stat_successfully_migrated_api_keys_to_sa_tokensGaugeTotal number of successful migrations of API keys to service account tokens.
grafana_stat_total_migrated_api_keys_to_sa_tokensGaugeTotal number of API keys to be migrated to service account tokens.
grafana_stat_total_orgsGaugeTotal amount of orgs.
grafana_stat_total_playlistsGaugeTotal amount of playlists.
grafana_stat_total_service_account_tokensGaugeTotal amount of service account tokens.
grafana_stat_total_service_accountsGaugeTotal amount of service accounts.
grafana_stat_total_service_accounts_role_noneGaugeTotal amount of service accounts with no role.
grafana_stat_total_teamsGaugeTotal amount of teams.
grafana_stat_total_usersGaugeTotal amount of users.
grafana_stat_totals_active_adminsGaugeTotal amount of active admins.
grafana_stat_totals_active_editorsGaugeTotal amount of active editors.
grafana_stat_totals_active_viewersGaugeTotal amount of active viewers.
grafana_stat_totals_adminsGaugeTotal amount of admins.
grafana_stat_totals_alert_rulesGaugeTotal amount of alert rules in the database.
grafana_stat_totals_annotationsGaugeTotal amount of annotations in the database.
grafana_stat_totals_correlationsGaugeTotal amount of correlations.
grafana_stat_totals_dashboardGaugeTotal amount of dashboards.
grafana_stat_totals_dashboard_versionsGaugeTotal amount of dashboard versions in the database.
grafana_stat_totals_data_keysGaugeTotal amount of data keys in the database.
grafana_stat_totals_datasourceGaugeTotal number of defined datasources, labeled by pluginId.
grafana_stat_totals_editorsGaugeTotal amount of editors.
grafana_stat_totals_folderGaugeTotal amount of folders.
grafana_stat_totals_library_panelsGaugeTotal amount of library panels in the database.
grafana_stat_totals_library_variablesGaugeTotal amount of library variables in the database.
grafana_stat_totals_public_dashboardGaugeTotal amount of public dashboards.
grafana_stat_totals_rule_groupsGaugeTotal amount of alert rule groups in the database.
grafana_stat_totals_viewersGaugeTotal amount of viewers.
grafana_storage_server_poller_query_latency_secondsHistogramPoller query latency.
loki_experimental_features_in_use_totalCounterThe number of experimental features in use.
net_conntrack_dialer_conn_attempted_totalCounterTotal number of connections attempted by the given dialer a given name.
net_conntrack_dialer_conn_closed_totalCounterTotal number of connections closed which originated from the dialer of a given name.
net_conntrack_dialer_conn_established_totalCounterTotal number of connections successfully established by the given dialer a given name.
net_conntrack_dialer_conn_failed_totalCounterTotal number of connections failed to dial by the dialer a given name.
openfga_cachecontroller_cache_hit_countCounterThe total number of cache hits from cachecontroller requests.
openfga_cachecontroller_cache_invalidation_countCounterThe total number of invalidations performed by the cache controller.
openfga_cachecontroller_cache_total_countCounterThe total number of cachecontroller requests.
openfga_check_cache_hit_countCounterThe total number of cache hits for ResolveCheck.
openfga_check_cache_invalid_hit_countCounterThe total number of cache hits for ResolveCheck that were discarded because they were invalidated.
openfga_check_cache_total_countCounterThe total number of calls to ResolveCheck.
openfga_condition_compilation_duration_msHistogramA histogram measuring the compilation time (in milliseconds) of a Condition.
openfga_condition_evaluation_costHistogramA histogram of the CEL evaluation cost of a Condition in a Relationship Tuple.
openfga_condition_evaluation_duration_msHistogramA histogram measuring the evaluation time (in milliseconds) of a Condition.
openfga_list_objects_further_eval_required_countCounterNumber of objects in a ListObjects call that needed to issue a Check call to determine a final result.
openfga_list_objects_no_further_eval_required_countCounterNumber of objects in a ListObjects call that did not need to issue a Check call to determine a final result.
openfga_shared_iterator_countGaugeThe current number of items of shared iterator.
openfga_shared_iterator_watchdog_timer_triggeredCounterTotal number of times watchdog timer is triggered.
plugins_active_instancesGaugeThe number of active plugin instances.
plugins_datasource_instances_totalCounterThe to tal number of data source instances created.
process_cpu_seconds_totalCounterTotal user and system CPU time spent in seconds.
process_max_fdsGaugeMaximum number of open file descriptors.
process_network_receive_bytes_totalCounterNumber of bytes received by the process over the network.
process_network_transmit_bytes_totalCounterNumber of bytes sent by the process over the network.
process_open_fdsGaugeNumber of open file descriptors.
process_resident_memory_bytesGaugeResident memory size in bytes.
process_start_time_secondsGaugeStart time of the process since unix epoch in seconds.
process_virtual_memory_bytesGaugeVirtual memory size in bytes.
process_virtual_memory_max_bytesGaugeMaximum amount of virtual memory available in bytes.
prometheus_template_text_expansion_failures_totalCounterThe total number of template text expansion failures.
prometheus_template_text_expansions_totalCounterThe total number of template text expansions.
promhttp_metric_handler_requests_in_flightGaugeCurrent number of scrapes being served.
promhttp_metric_handler_requests_totalCounterTotal number of scrapes by HTTP status code.

Setting Up Monitoring with Prometheus​

To effectively monitor Grafana using these metrics, follow these steps:

  1. Enable Metrics in Grafana: Ensure that the Prometheus metrics endpoint is enabled in your Grafana configuration. By default, metrics are available at /metrics on the Grafana server.

    • Edit the grafana.ini file and set metrics.enabled = true under the [metrics] section.
    • Restart Grafana to apply the changes.
  2. Configure Prometheus: Add a scrape target in your Prometheus configuration to collect metrics from Grafana.

    scrape_configs:
    - job_name: "grafana"
    static_configs:
    - targets: ["<grafana-host>:<grafana-port>"]
  3. Create Dashboards: Use Grafana's built-in Prometheus data source to create dashboards visualizing key metrics such as grafana_http_request_duration_seconds, grafana_alerting_active_alerts, and grafana_process_resident_memory_bytes.

  4. Set Up Alerts: Define alerting rules in Prometheus to notify you of critical conditions, such as high API latency or a spike in invalid alerts.

Practical Use Cases​

  • Performance Monitoring: Use metrics like grafana_http_request_duration_seconds and grafana_database_conn_wait_count_total to identify slow API responses or database bottlenecks.
  • Alerting Health: Track grafana_alerting_notification_latency_seconds to ensure timely alert notifications.
  • Resource Utilization: Monitor grafana_process_cpu_seconds_total and grafana_process_resident_memory_bytes to optimize server resources.
  • User Activity: Analyze grafana_stat_active_users and grafana_authentication_attempts to understand user engagement and detect potential security issues.

Conclusion​

Integrating Grafana's Prometheus metrics into your monitoring setup enables you to maintain the health, performance, and reliability of your Grafana instance. By creating custom dashboards and alerts tailored to your environment, you can proactively address issues and optimize your system.

We encourage you to explore these metrics, experiment with custom PromQL queries, and adapt your monitoring strategy to meet your organization's specific needs.