Prometheus with High Availability
The following example demonstrates configuring a Grafana, Business Studio, and Prometheus trio to work with alerts.
As you know, Prometheus is a free and open-source core technology for monitoring and observability of systems. That means we need a high-availability use case (HA) to show it in bright and shiny armor.
System setup​
Before diving into the step-by-step configuration, let's review the system setup we will monitor and create alerts for. The system consists of two environments - Alpha and Charlie.
Alpha cluster​
In the Alpha environment, there are two mirrored business engines: engine1.alpha
and engine2.alpha
. They are connected with the Grafana cluster, which has two mirrored Grafana instances Grafana1
and Grafana2
.

To store the business engine and Grafana configuration, we employ PostgreSQL (can be configured as master-slave or HA). Prometheus is here to collect metrics from the business engine and Grafana.
Charlie cluster​
In the Charlie environment, there are three business engines:engine1.charlie
, engine2.charlie
, and engine3.charlie
. They serve one Grafana cloud instance.

PostgreSQL stores the business engine configuration. Prometheus collects performance metrics from all three business engines.
Business Engine metrics endpoints​
Every business engine provides two metrics endpoints to collect performance data:
- Port 3001 for API server
- Port 3002 for Scheduler
Use case​
We want to monitor the CPU usage by all five business engines distinctively for each provided service (meaning 10 instances to monitor). In the event of either of them exceeding 2%:
- Create a Grafana anotation.
- Write logs with alert payload.
- Create a file on the designated JSON server with all the details of the CPU exceeding event.
Grafana​
In Grafana, we want to have a time series visualization where exceeding 2% CPU usage would be visually noticeable immediately.

Dashboard variables​
It is important to note that our use case requires firing an alert only for a particular business engine/service. In all alert messages (in the log records and JSON file notes), we want to know which specific business engine/service causes the problem.
To make it possible, all the observable metrics need to contain this details - business engine/service. To make the distinctive firing possible, we use Grafana dashboard variables.
Dashboard variable configuration​

Create a dashboard variable:
-
Type Query.
-
Name instance.
-
Data source prometheus:
- Query Label values,
- Label instance,
- Metric nodejs_version_info.
-
In the Preview of values, all 10 services that are needed to be monitored are displayed.
Time Series panel configuration​

- Select the configured Prometheus data source.
- Specify a query to extract the user's CPU usage.
- Specify a query to extract the system CPU usage.
- Dashboard variable with all 10 services to monitor.
- Select the time series panel.
- Set up the threshold where values above 2 are out of the allowable range.
Business Studio​
Business Studio can be connected to one or all business engines within a cluster.

Alert history​
The section is coming soon...
Alert rule configuration​
The section is coming soon...
Data preview​
The section is coming soon...