Skip to main content

Business Alerting in Grafana

Daria Volkova
Co-Founder at Volkov Labs, Grafana Champion

While working closely with the Community and helping to solve production use cases, we have accumulated quite a list of wishes for reimagined alerting functioning.

Most asks were related to simplifying user interaction by possibly having all controls in one UI place. Many users were eager to incorporate anomaly detection paired with reporting for self-hosted Grafana.

This is how the idea of the Business Alerting was born. With the general thought of making alerting accessible for business users.

In the video below, I go over the alerting basics, existing alerting system components and our vision of Business Alerting. The article below reiterates many points I made in the video and adds more details to some topics.

Business Alerting Announcement.

Alerting basics

Alerting is a system to observe how your data changes and act when a change occurs.

The three main alerting components:

  1. An alert rule. It is an instruction to evaluate the observable data. Most alert rules have parameters like time frame to check, how often, query to run (SQL and PromQL, etc.), and thresholds.

  2. An alert record. An alert record is created when observable data goes beyond the threshold.

  3. An alert action. It is an action triggered by an alert record.

Alerting basics.
Alerting basics.

So, you describe WHAT to observe and specify the rules of HOW exactly. Then every time the rule is broken, a detailed record with specifics is created. Following the created alert records, alert actions are initiated.

Native versus Business Alerting

The schema below depicts the Alerting as it is side-by-side with the Business Alerting, so you can see the similarities and differences. Each of the main alerting components (rule, record, and action) has a corresponding software module.

Alerting versus Business Alerting.
Alerting versus Business Alerting.

Users create alert rules using the Alerting UI.

The alerting records are created by the Alert Manager. Every time a rule is broken it creates a record in the annotation table, that way Grafana knows to add a vertical line on a corresponding Time Series. The Alert Manager works only with backend data sources.

For the alert actions, Grafana has an extensive notification alerting channel system. Based on the amount of questions we received and came across, it has a steep learning curve. It allows you to set the channels to configure sending text, Slack messages, emails and OnCall.

Webhooks, which are 3rd party APIs, can also be added as triggered by an alert rule record. However, even if the possibility exists the implementation might be foggy for many.

In the Business Alerting, we reimaged all three modules.

Business Alerting panel

The Business Alerting panel is set to simplify the alert rule creation. We designed it to be intuitive and business-oriented users friendly. You specify all parameters in a one-screen form.

The new alert rule/edit mode has the following configuration elements to specify:

  • Title is an alert name.
  • Schedule is a frequency of how often the rule should run. With CRON expressions your schedule can be as complex as needed.
  • Target Dashboard and Target Panel are drop-downs to select from the existing ones. The alert rule will take queries and thresholds from there automatically.
  • Time Range could be either taken from the dashboard and specified custom.
  • For the alert action, select from the drop-down list of the existing, pre-configured webhook APIs.
  • Disabled Annotation option will prevent sending a record into the annotation table.

We target to have the Business Alerting panel to work with hundreds of alerts with grouping and filtering to ensure easy navigation and control.

Business Alerting panel.
Business Alerting panel.

Business Engine

We reimagined the Alert Manager and came up with the Business Engine:

  • It uses dashboards as configuration which means it retrieves dashboard queries and thresholds and uses them as alert rule parameters. That eliminates the duplicative work when users have to enter the same specifics twice.
  • It is installed in a separate container which makes the system architecture flexible.

In the future, we will include anomaly detection with AI algorithms.

To connect to the Business Engine, you need the Business Engine data source installed and configured.

Business Engine data source.
Business Engine data source.

Webhooks configuration

This panel lists all configured webhooks. In the edit mode, you specify a name, type (HTTP or Test), request URL and request method. For now, only POST is available.

Webhook configuration panel.
Webhook configuration panel.

Getting started

You can download the latest release from our GitHub repository and follow this hands-on tutorial.

Metrics, Logs, CPU Usage with Business Alerting in Grafana.

The docker-compose file consists of the following containers:

  • Grafana includes the provisioned Business Engine data source, an Alerting panel, and an example dashboard.
  • Timescale is required to store configuration, events, rules, etc.
  • Business Engine has a service account key to access Grafana HTTP APIs. It evaluates alert rules and calls webhooks when alert statuses change.
  • JSON webhook is a webhook example based on NodeJS, which accepts alert payload and saves it to the files for testing purposes.
  • Node-RED provides an HTTP POST endpoint and sends a Slack notification with alert details.

When you run a docker-compose file, it launches the Grafana, Timescale, Business Engine and Webhook containers.

docker-compose.yml
loading...

Always happy to hear from you

  Enroll in Business Suite Enterprise