Monitoring Grafana 12 with Prometheus Metrics
In this guide, we explore the extensive set of Prometheus metrics exposed by Grafana 12.0.2. These metrics provide deep insights into the performance, health, and behavior of your Grafana instance, enabling effective monitoring and troubleshooting.
By leveraging these metrics, you can track critical aspects of your Grafana environment, including alerting, API usage, database performance, and user interactions. Whether you're a system administrator, DevOps engineer, or data enthusiast, understanding and utilizing these metrics will help you optimize your Grafana setup and ensure its reliability.
This article provides a comprehensive list of Prometheus metrics available in Grafana 12, along with their types and detailed descriptions. We also offer guidance on integrating these metrics into your monitoring strategy to build powerful dashboards and alerts.
Why Monitor Grafana with Prometheus?​
Monitoring Grafana using Prometheus metrics allows you to:
- Detect Issues Early: Identify performance bottlenecks or system failures before they impact users.
- Optimize Resources: Understand resource usage patterns to allocate CPU, memory, and database capacity effectively.
- Improve User Experience: Monitor API response times and user interactions to ensure a seamless experience.
- Enhance Alerting: Track alerting system health to ensure timely notifications for critical events.
Prometheus Metrics in Grafana 12​
Below is a detailed list of Prometheus metrics exposed by Grafana 12.0.2.
Metric Name | Type | Description |
---|---|---|
grafana_access_evaluation_count | Counter | Number of evaluation calls. |
grafana_access_evaluation_duration | Histogram | Histogram for the runtime of evaluation function. |
grafana_access_permissions_cache_usage | Counter | Access control permissions cache hit/miss. |
grafana_access_permissions_duration | Histogram | Histogram for the runtime of permissions check function. |
grafana_access_search_permissions_duration | Histogram | Histogram for the runtime of permissions search function. |
grafana_access_search_user_permissions_cache_usage | Counter | Access control search user permissions cache hit/miss. |
grafana_aggregator_discovery_aggregation_count_total | Counter | Counter of number of times discovery was aggregated. |
grafana_alerting_active_alerts | Gauge | Amount of active alerts. |
grafana_alerting_active_configurations | Gauge | The number of active Alertmanager configurations. |
grafana_alerting_alertmanager_alerts | Gauge | How many alerts by state are in Grafana's Alertmanager. |
grafana_alerting_alertmanager_config_hash | Gauge | The hash of the Alertmanager configuration. |
grafana_alerting_alertmanager_config_match | Gauge | The total number of match. |
grafana_alerting_alertmanager_config_match_re | Gauge | The total number of matchRE. |
grafana_alerting_alertmanager_config_matchers | Gauge | The total number of matchers. |
grafana_alerting_alertmanager_config_object_matchers | Gauge | The total number of object_matchers. |
grafana_alerting_alertmanager_config_size_bytes | Gauge | The size of the Grafana Alertmanager configuration in bytes. |
grafana_alerting_alertmanager_inhibition_rules | Gauge | Number of configured inhibition rules. |
grafana_alerting_alertmanager_integrations | Gauge | Number of configured receivers. |
grafana_alerting_alertmanager_receivers | Gauge | Number of configured receivers by state. It is considered active if used within a route. |
grafana_alerting_alerts | Gauge | How many alerts by state are in the scheduler. |
grafana_alerting_alerts_invalid_total | Counter | The total number of received alerts that were invalid. |
grafana_alerting_alerts_received_total | Counter | The total number of received alerts. |
grafana_alerting_discovered_configurations | Gauge | The number of organizations we've discovered that require an Alertmanager configuration. |
grafana_alerting_dispatcher_aggregation_groups | Gauge | Number of active aggregation groups. |
grafana_alerting_dispatcher_alert_processing_duration_seconds | Summary | Summary of latencies for the processing of alerts. |
grafana_alerting_execution_time_milliseconds | Summary | Summary of alert execution duration. |
grafana_alerting_nflog_gc_duration_seconds | Summary | Duration of the last notification log garbage collection cycle. |
grafana_alerting_nflog_gossip_messages_propagated_total | Counter | Number of received gossip messages that have been further gossiped. |
grafana_alerting_nflog_queries_total | Counter | Number of notification log queries were received. |
grafana_alerting_nflog_query_duration_seconds | Histogram | Duration of notification log query evaluation. |
grafana_alerting_nflog_query_errors_total | Counter | Number of notification log received queries that failed. |
grafana_alerting_nflog_snapshot_duration_seconds | Summary | Duration of the last notification log snapshot. |
grafana_alerting_nflog_snapshot_size_bytes | Gauge | Size of the last notification log snapshot in bytes. |
grafana_alerting_notification_latency_seconds | Histogram | The latency of notifications in seconds. |
grafana_alerting_remote_alertmanager_configuration_sync_failures_total | Counter | Total number of failed attempts to sync configurations between Alertmanagers. |
grafana_alerting_remote_alertmanager_configuration_syncs_total | Counter | Total number of configuration syncs to the remote Alertmanager. |
grafana_alerting_remote_alertmanager_last_configuration_sync_timestamp_seconds | Gauge | Timestamp of the last successful configuration sync to the remote Alertmanager in seconds. |
grafana_alerting_remote_alertmanager_last_readiness_check_timestamp_seconds | Gauge | Timestamp of the last successful readiness check to the remote Alertmanager in seconds. |
grafana_alerting_remote_alertmanager_last_state_sync_timestamp_seconds | Gauge | Timestamp of the last successful state sync to the remote Alertmanager in seconds. |
grafana_alerting_remote_alertmanager_state_sync_failures_total | Counter | Total number of failed attempts to sync state between Alertmanagers. |
grafana_alerting_remote_alertmanager_state_syncs_total | Counter | Total number of state syncs to the remote Alertmanager. |
grafana_alerting_request_duration_seconds | Histogram | Histogram of requests to the Alerting API. |
grafana_alerting_schedule_alert_rules | Gauge | The number of alert rules that could be considered for evaluation at the next tick. |
grafana_alerting_schedule_alert_rules_hash | Gauge | A hash of the alert rules that could be considered for evaluation at the next tick. |
grafana_alerting_schedule_periodic_duration_seconds | Histogram | The time taken to run the scheduler. |
grafana_alerting_schedule_query_alert_rules_duration_seconds | Histogram | The time taken to fetch alert rules from the database. |
grafana_alerting_scheduler_behind_seconds | Gauge | The total number of seconds the scheduler is behind. |
grafana_alerting_silences | Gauge | How many silences by state. |
grafana_alerting_silences_gc_duration_seconds | Summary | Duration of the last silence garbage collection cycle. |
grafana_alerting_silences_gossip_messages_propagated_total | Counter | Number of received gossip messages that have been further gossiped. |
grafana_alerting_silences_queries_total | Counter | How many silence queries were received. |
grafana_alerting_silences_query_duration_seconds | Histogram | Duration of silence query evaluation. |
grafana_alerting_silences_query_errors_total | Counter | How many silence received queries did not succeed. |
grafana_alerting_silences_snapshot_duration_seconds | Summary | Duration of the last silence snapshot. |
grafana_alerting_silences_snapshot_size_bytes | Gauge | Size of the last silence snapshot in bytes. |
grafana_alerting_state_calculation_duration_seconds | Histogram | The duration of calculation of a single state. |
grafana_alerting_state_full_sync_duration_seconds | Histogram | The duration of fully synchronizing the state with the database. |
grafana_alerting_state_history_info | Gauge | Information about the state history store. |
grafana_alerting_state_history_writes_bytes_total | Counter | The total number of bytes sent within a batch to the state history store. |
grafana_alerting_ticker_interval_seconds | Gauge | Interval at which the ticker is meant to tick. |
grafana_alerting_ticker_last_consumed_tick_timestamp_seconds | Gauge | Timestamp of the last consumed tick in seconds. |
grafana_alerting_ticker_next_tick_timestamp_seconds | Gauge | Timestamp of the next tick in seconds before it is consumed. |
grafana_api_admin_user_created_total | Counter | API admin user created counter. |
grafana_api_dashboard_get_milliseconds | Summary | Summary for dashboard get duration. |
grafana_api_dashboard_save_milliseconds | Summary | Summary for dashboard save duration. |
grafana_api_dashboard_search_milliseconds | Summary | Summary for dashboard search duration. |
grafana_api_dashboard_snapshot_create_total | Counter | Dashboard snapshots created. |
grafana_api_dashboard_snapshot_external_total | Counter | External dashboard snapshots created. |
grafana_api_dashboard_snapshot_get_total | Counter | Loaded dashboards. |
grafana_api_dataproxy_request_all_milliseconds | Summary | Summary for dataproxy request duration. |
grafana_api_login_oauth_total | Counter | API login OAuth counter. |
grafana_api_login_post_total | Counter | API login post counter. |
grafana_api_login_saml_total | Counter | API login SAML counter. |
grafana_api_models_dashboard_insert_total | Counter | Dashboards inserted. |
grafana_api_org_create_total | Counter | API org created counter. |
grafana_api_response_status_total | Counter | API HTTP response status. |
grafana_api_user_signup_completed_total | Counter | Amount of users who completed the signup flow. |
grafana_api_user_signup_invite_total | Counter | Amount of users who have been invited. |
grafana_api_user_signup_started_total | Counter | Amount of users who started the signup flow. |
grafana_apiserver_audit_event_total | Counter | Counter of audit events generated and sent to the audit backend. |
grafana_apiserver_audit_requests_rejected_total | Counter | Counter of apiserver requests rejected due to an error in audit logging backend. |
grafana_apiserver_client_certificate_expiration_seconds | Histogram | Distribution of the remaining lifetime on the certificate used to authenticate a request. |
grafana_apiserver_current_inflight_requests | Gauge | Maximal number of currently used inflight request limit of this apiserver per request kind in last second. |
grafana_apiserver_envelope_encryption_dek_cache_fill_percent | Gauge | Percent of the cache slots currently occupied by cached DEKs. |
grafana_apiserver_flowcontrol_read_vs_write_current_requests | Histogram | Observations of the number of requests waiting or in regular stage of execution. |
grafana_apiserver_flowcontrol_seat_fair_frac | Gauge | Fair fraction of server's concurrency to allocate to each priority level that can use it. |
grafana_apiserver_kube_aggregator_x509_insecure_sha1_total | Counter | Counts the number of requests to servers with insecure SHA1 signatures in their serving certificate. |
grafana_apiserver_kube_aggregator_x509_missing_san_total | Counter | Counts the number of requests to servers missing SAN extension in their serving certificate. |
grafana_apiserver_request_body_size_bytes | Histogram | Apiserver request body size in bytes broken out by resource and verb. |
grafana_apiserver_request_duration_seconds | Histogram | Response latency distribution in seconds for each verb, dry run value, group, version, resource, etc. |
grafana_apiserver_request_filter_duration_seconds | Histogram | Request filter latency distribution in seconds, for each filter type. |
grafana_apiserver_request_sli_duration_seconds | Histogram | Response latency distribution (not counting webhook duration and priority & fairness queue wait times). |
grafana_apiserver_request_slo_duration_seconds | Histogram | Response latency distribution (not counting webhook duration and priority & fairness queue wait times). |
grafana_apiserver_request_timestamp_comparison_time | Histogram | Time taken for comparison of old vs new objects in UPDATE or PATCH requests. |
grafana_apiserver_request_total | Counter | Counter of apiserver requests broken out for each verb, dry run value, group, version, resource, etc. |
grafana_apiserver_response_sizes | Histogram | Response size distribution in bytes for each group, version, verb, resource, subresource, scope, etc. |
grafana_apiserver_selfrequest_total | Counter | Counter of apiserver self-requests broken out for each verb, API resource, and subresource. |
grafana_apiserver_storage_data_key_generation_duration_seconds | Histogram | Latencies in seconds of data encryption key (DEK) generation operations. |
grafana_apiserver_storage_data_key_generation_failures_total | Counter | Total number of failed data encryption key (DEK) generation operations. |
grafana_apiserver_storage_envelope_transformation_cache_misses_total | Counter | Total number of cache misses while accessing key decryption key (KEK). |
grafana_apiserver_storage_objects | Gauge | Number of stored objects at the time of last check split by kind. |
grafana_apiserver_tls_handshake_errors_total | Counter | Number of requests dropped with 'TLS handshake error from' error. |
grafana_apiserver_webhooks_x509_insecure_sha1_total | Counter | Counts the number of requests to servers with insecure SHA1 signatures in their serving certificate. |
grafana_apiserver_webhooks_x509_missing_san_total | Counter | Counts the number of requests to servers missing SAN extension in their serving certificate. |
grafana_authenticated_user_requests | Counter | Counter of authenticated requests broken out by username. |
grafana_authentication_attempts | Counter | Counter of authenticated attempts. |
grafana_authentication_duration_seconds | Histogram | Authentication duration in seconds broken out by result. |
grafana_authn_authn_failed_authentication_total | Counter | Number of failed authentications. |
grafana_authn_authn_successful_authentication_total | Counter | Number of successful authentications. |
grafana_authn_authn_successful_login_total | Counter | Number of successful logins. |
grafana_authorization_attempts_total | Counter | Counter of authorization attempts broken down by result. |
grafana_authorization_duration_seconds | Histogram | Authorization duration in seconds broken out by result. |
grafana_build_info | Gauge | A metric with a constant '1' value labeled by version, revision, branch, and goversion from which Grafana was built. |
grafana_build_timestamp | Gauge | A metric exposing when the binary was built in epoch. |
grafana_cardinality_enforcement_unexpected_categorizations_total | Counter | The count of unexpected categorizations during cardinality enforcement. |
grafana_database_all_migrations_duration_seconds | Histogram | Duration of the entire SQL migration process in seconds. |
grafana_database_conn_idle | Gauge | The number of idle connections. |
grafana_database_conn_in_use | Gauge | The number of connections currently in use. |
grafana_database_conn_max_idle_closed_seconds | Counter | The total number of connections closed due to SetConnMaxIdleTime. |
grafana_database_conn_max_idle_closed_total | Counter | The total number of connections closed due to SetMaxIdleConns. |
grafana_database_conn_max_lifetime_closed_total | Counter | The total number of connections closed due to SetConnMaxLifetime. |
grafana_database_conn_max_open | Gauge | Maximum number of open connections to the database. |
grafana_database_conn_open | Gauge | The number of established connections both in use and idle. |
grafana_database_conn_wait_count_total | Counter | The total number of connections waited for. |
grafana_database_conn_wait_duration_seconds | Counter | The total time blocked waiting for a new connection. |
grafana_datasource_request_duration_seconds | Histogram | Histogram of durations of outgoing data source requests sent from Grafana. |
grafana_datasource_request_in_flight | Gauge | A gauge of outgoing data source requests currently being sent by Grafana. |
grafana_datasource_request_total | Counter | A counter for outgoing requests for a data source. |
grafana_datasource_response_size_bytes | Histogram | Histogram of data source response sizes returned to Grafana. |
grafana_db_datasource_query_by_id_total | Counter | Counter for getting datasource by id. |
grafana_disabled_metrics_total | Counter | The count of disabled metrics. |
grafana_emails_sent_failed | Counter | Number of emails Grafana failed to send. |
grafana_emails_sent_total | Counter | Number of emails sent by Grafana. |
grafana_encryption_cache_reads_total | Counter | A counter for encryption cache reads. |
grafana_encryption_ops_total | Counter | A counter for encryption operations. |
grafana_environment_info | Gauge | A metric with a constant '1' value labeled by environment information about the running instance. |
grafana_feature_toggles_info | Gauge | Info metric that exposes what feature toggles are enabled or not. |
grafana_field_validation_request_duration_seconds | Histogram | Response latency distribution in seconds for each field validation value. |
grafana_folder_id_api_count | Counter | Counter for folder id usage in API package. |
grafana_folder_id_service_count | Counter | Counter for folder id usage in service package. |
grafana_folders_get_children_duration_seconds | Histogram | Duration of listing subfolders in specific folder. |
grafana_frontend_boot_css_time_seconds | Histogram | Frontend boot initial CSS load. |
grafana_frontend_boot_first_contentful_paint_time_seconds | Histogram | Frontend boot first contentful paint. |
grafana_frontend_boot_first_paint_time_seconds | Histogram | Frontend boot first paint. |
grafana_frontend_boot_js_done_time_seconds | Histogram | Frontend boot initial JS load. |
grafana_frontend_boot_load_time_seconds | Histogram | Frontend boot time measurement. |
grafana_frontend_plugins_preload_ms | Histogram | Frontend preload plugin time measurement. |
grafana_hidden_metrics_total | Counter | The count of hidden metrics. |
grafana_http_request_duration_seconds | Histogram | Histogram of latencies for HTTP requests. |
grafana_http_request_in_flight | Gauge | A gauge of requests currently being served by Grafana. |
grafana_iam_authz_direct_db_service_invalid_request_count | Counter | AuthZ service invalid request count. |
grafana_iam_authz_direct_db_service_permission_cache_usage | Counter | AuthZ service permission cache usage. |
grafana_idforwarding_idforwarding_failed_token_signing_total | Counter | Number of failed token signings. |
grafana_idforwarding_idforwarding_token_signing_duration_seconds | Histogram | Histogram of token signing duration. |
grafana_idforwarding_idforwarding_token_signing_from_cache_total | Counter | Number of signed tokens retrieved from cache. |
grafana_idforwarding_idforwarding_token_signing_total | Counter | Number of token signings. |
grafana_index_server_index_size | Gauge | Size of the index in bytes - only for file-based indices. |
grafana_instance_start_total | Counter | Counter for started instances. |
grafana_ldap_users_sync_execution_time | Summary | Summary for LDAP users sync execution duration. |
grafana_live_broker_redis_pub_sub_dropped_messages | Counter | Number of dropped messages on application level in Redis PUB/SUB. |
grafana_live_client_command_duration_seconds | Summary | Client command duration summary. |
grafana_live_client_connections_inflight | Gauge | Number of inflight client connections. |
grafana_live_client_ping_pong_duration_seconds | Histogram | Ping/Pong duration in seconds. |
grafana_live_client_subscriptions_inflight | Gauge | Number of inflight client subscriptions. |
grafana_live_node_action_count | Counter | Number of various actions called. |
grafana_live_node_broadcast_duration_seconds | Histogram | Broadcast duration in seconds. |
grafana_live_node_build | Gauge | Node build info. |
grafana_live_node_messages_received_count | Counter | Number of messages received from broker. |
grafana_live_node_messages_sent_count | Counter | Number of messages sent by node to broker. |
grafana_live_node_num_channels | Gauge | Number of channels with one or more subscribers. |
grafana_live_node_num_clients | Gauge | Number of clients connected. |
grafana_live_node_num_nodes | Gauge | Number of nodes in the cluster. |
grafana_live_node_num_subscriptions | Gauge | Number of subscriptions. |
grafana_live_node_num_users | Gauge | Number of unique users connected. |
grafana_live_node_pub_sub_lag_seconds | Histogram | Pub sub lag in seconds. |
grafana_live_transport_messages_received | Counter | Number of messages received from client connections over specific transport. |
grafana_live_transport_messages_received_size | Counter | Size in bytes of messages received from client connections over specific transport. |
grafana_live_transport_messages_sent | Counter | Number of messages sent to client connections over specific transport. |
grafana_live_transport_messages_sent_size | Counter | Size in bytes of messages sent to client connections over specific transport. |
grafana_page_response_status_total | Counter | Page HTTP response status. |
grafana_plugin_build_info | Gauge | A metric with a constant '1' value labeled by pluginId, pluginType, and version from which Grafana plugin was built. |
grafana_plugin_request_duration_milliseconds | Histogram | Plugin request duration. |
grafana_plugin_request_duration_seconds | Histogram | Plugin request duration in seconds. |
grafana_plugin_request_size_bytes | Histogram | Histogram of plugin request sizes returned. |
grafana_plugin_request_total | Counter | The total amount of plugin requests. |
grafana_plugin_target_info | Gauge | A metric with a constant '1' value labeled by pluginId and target. |
grafana_plugins_preinstall_duration_seconds | Histogram | Plugin preinstallation duration. |
grafana_plugins_preinstall_total | Counter | The total amount of plugin preinstallations. |
grafana_process_cpu_seconds_total | Counter | Total user and system CPU time spent in seconds. |
grafana_process_max_fds | Gauge | Maximum number of open file descriptors. |
grafana_process_network_receive_bytes_total | Counter | Number of bytes received by the process over the network. |
grafana_process_network_transmit_bytes_total | Counter | Number of bytes sent by the process over the network. |
grafana_process_open_fds | Gauge | Number of open file descriptors. |
grafana_process_resident_memory_bytes | Gauge | Resident memory size in bytes. |
grafana_process_start_time_seconds | Gauge | Start time of the process since unix epoch in seconds. |
grafana_process_virtual_memory_bytes | Gauge | Virtual memory size in bytes. |
grafana_process_virtual_memory_max_bytes | Gauge | Maximum amount of virtual memory available in bytes. |
grafana_prometheus_plugin_backend_request_count | Counter | The total amount of Prometheus backend plugin requests. |
grafana_proxy_response_status_total | Counter | Proxy HTTP response status. |
grafana_public_dashboard_request_count | Counter | Counter for public dashboards requests. |
grafana_registered_metrics_total | Counter | The count of registered metrics broken by stability level and deprecation version. |
grafana_rendering_queue_size | Gauge | Size of rendering queue. |
grafana_search_dashboard_search_failures_duration_seconds | Histogram | Duration of dashboard search failures. |
grafana_search_dashboard_search_successes_duration_seconds | Histogram | Duration of dashboard search successes. |
grafana_stat_active_users | Gauge | Number of active users. |
grafana_stat_failed_migrated_api_keys_to_sa_tokens | Gauge | Total number of failed migrations of API keys to service account tokens. |
grafana_stat_successfully_migrated_api_keys_to_sa_tokens | Gauge | Total number of successful migrations of API keys to service account tokens. |
grafana_stat_total_migrated_api_keys_to_sa_tokens | Gauge | Total number of API keys to be migrated to service account tokens. |
grafana_stat_total_orgs | Gauge | Total amount of orgs. |
grafana_stat_total_playlists | Gauge | Total amount of playlists. |
grafana_stat_total_service_account_tokens | Gauge | Total amount of service account tokens. |
grafana_stat_total_service_accounts | Gauge | Total amount of service accounts. |
grafana_stat_total_service_accounts_role_none | Gauge | Total amount of service accounts with no role. |
grafana_stat_total_teams | Gauge | Total amount of teams. |
grafana_stat_total_users | Gauge | Total amount of users. |
grafana_stat_totals_active_admins | Gauge | Total amount of active admins. |
grafana_stat_totals_active_editors | Gauge | Total amount of active editors. |
grafana_stat_totals_active_viewers | Gauge | Total amount of active viewers. |
grafana_stat_totals_admins | Gauge | Total amount of admins. |
grafana_stat_totals_alert_rules | Gauge | Total amount of alert rules in the database. |
grafana_stat_totals_annotations | Gauge | Total amount of annotations in the database. |
grafana_stat_totals_correlations | Gauge | Total amount of correlations. |
grafana_stat_totals_dashboard | Gauge | Total amount of dashboards. |
grafana_stat_totals_dashboard_versions | Gauge | Total amount of dashboard versions in the database. |
grafana_stat_totals_data_keys | Gauge | Total amount of data keys in the database. |
grafana_stat_totals_datasource | Gauge | Total number of defined datasources, labeled by pluginId. |
grafana_stat_totals_editors | Gauge | Total amount of editors. |
grafana_stat_totals_folder | Gauge | Total amount of folders. |
grafana_stat_totals_library_panels | Gauge | Total amount of library panels in the database. |
grafana_stat_totals_library_variables | Gauge | Total amount of library variables in the database. |
grafana_stat_totals_public_dashboard | Gauge | Total amount of public dashboards. |
grafana_stat_totals_rule_groups | Gauge | Total amount of alert rule groups in the database. |
grafana_stat_totals_viewers | Gauge | Total amount of viewers. |
grafana_storage_server_poller_query_latency_seconds | Histogram | Poller query latency. |
loki_experimental_features_in_use_total | Counter | The number of experimental features in use. |
net_conntrack_dialer_conn_attempted_total | Counter | Total number of connections attempted by the given dialer a given name. |
net_conntrack_dialer_conn_closed_total | Counter | Total number of connections closed which originated from the dialer of a given name. |
net_conntrack_dialer_conn_established_total | Counter | Total number of connections successfully established by the given dialer a given name. |
net_conntrack_dialer_conn_failed_total | Counter | Total number of connections failed to dial by the dialer a given name. |
openfga_cachecontroller_cache_hit_count | Counter | The total number of cache hits from cachecontroller requests. |
openfga_cachecontroller_cache_invalidation_count | Counter | The total number of invalidations performed by the cache controller. |
openfga_cachecontroller_cache_total_count | Counter | The total number of cachecontroller requests. |
openfga_check_cache_hit_count | Counter | The total number of cache hits for ResolveCheck. |
openfga_check_cache_invalid_hit_count | Counter | The total number of cache hits for ResolveCheck that were discarded because they were invalidated. |
openfga_check_cache_total_count | Counter | The total number of calls to ResolveCheck. |
openfga_condition_compilation_duration_ms | Histogram | A histogram measuring the compilation time (in milliseconds) of a Condition. |
openfga_condition_evaluation_cost | Histogram | A histogram of the CEL evaluation cost of a Condition in a Relationship Tuple. |
openfga_condition_evaluation_duration_ms | Histogram | A histogram measuring the evaluation time (in milliseconds) of a Condition. |
openfga_list_objects_further_eval_required_count | Counter | Number of objects in a ListObjects call that needed to issue a Check call to determine a final result. |
openfga_list_objects_no_further_eval_required_count | Counter | Number of objects in a ListObjects call that did not need to issue a Check call to determine a final result. |
openfga_shared_iterator_count | Gauge | The current number of items of shared iterator. |
openfga_shared_iterator_watchdog_timer_triggered | Counter | Total number of times watchdog timer is triggered. |
plugins_active_instances | Gauge | The number of active plugin instances. |
plugins_datasource_instances_total | Counter | The to tal number of data source instances created. |
process_cpu_seconds_total | Counter | Total user and system CPU time spent in seconds. |
process_max_fds | Gauge | Maximum number of open file descriptors. |
process_network_receive_bytes_total | Counter | Number of bytes received by the process over the network. |
process_network_transmit_bytes_total | Counter | Number of bytes sent by the process over the network. |
process_open_fds | Gauge | Number of open file descriptors. |
process_resident_memory_bytes | Gauge | Resident memory size in bytes. |
process_start_time_seconds | Gauge | Start time of the process since unix epoch in seconds. |
process_virtual_memory_bytes | Gauge | Virtual memory size in bytes. |
process_virtual_memory_max_bytes | Gauge | Maximum amount of virtual memory available in bytes. |
prometheus_template_text_expansion_failures_total | Counter | The total number of template text expansion failures. |
prometheus_template_text_expansions_total | Counter | The total number of template text expansions. |
promhttp_metric_handler_requests_in_flight | Gauge | Current number of scrapes being served. |
promhttp_metric_handler_requests_total | Counter | Total number of scrapes by HTTP status code. |
Setting Up Monitoring with Prometheus​
To effectively monitor Grafana using these metrics, follow these steps:
-
Enable Metrics in Grafana: Ensure that the Prometheus metrics endpoint is enabled in your Grafana configuration. By default, metrics are available at
/metrics
on the Grafana server.- Edit the
grafana.ini
file and setmetrics.enabled = true
under the[metrics]
section. - Restart Grafana to apply the changes.
- Edit the
-
Configure Prometheus: Add a scrape target in your Prometheus configuration to collect metrics from Grafana.
scrape_configs:
- job_name: "grafana"
static_configs:
- targets: ["<grafana-host>:<grafana-port>"] -
Create Dashboards: Use Grafana's built-in Prometheus data source to create dashboards visualizing key metrics such as
grafana_http_request_duration_seconds
,grafana_alerting_active_alerts
, andgrafana_process_resident_memory_bytes
. -
Set Up Alerts: Define alerting rules in Prometheus to notify you of critical conditions, such as high API latency or a spike in invalid alerts.
Practical Use Cases​
- Performance Monitoring: Use metrics like
grafana_http_request_duration_seconds
andgrafana_database_conn_wait_count_total
to identify slow API responses or database bottlenecks. - Alerting Health: Track
grafana_alerting_notification_latency_seconds
to ensure timely alert notifications. - Resource Utilization: Monitor
grafana_process_cpu_seconds_total
andgrafana_process_resident_memory_bytes
to optimize server resources. - User Activity: Analyze
grafana_stat_active_users
andgrafana_authentication_attempts
to understand user engagement and detect potential security issues.
Conclusion​
Integrating Grafana's Prometheus metrics into your monitoring setup enables you to maintain the health, performance, and reliability of your Grafana instance. By creating custom dashboards and alerts tailored to your environment, you can proactively address issues and optimize your system.
We encourage you to explore these metrics, experiment with custom PromQL queries, and adapt your monitoring strategy to meet your organization's specific needs.