Metrics Alarms with Prometheus and Alertmanager

By | April 20, 2021

In this article I will use Prometheus and Alertmanager to send alarms when disk-space is low for a Spring Boot application. Actually, any service or application which have metrics that can be scraped by Prometheus can have alarms so the technique described in this article is not only applicable to Spring Boot services and applications.

A complete example is available on GitHub.

Prerequisites

Given that Prometheus will be used to scrape the metrics, a service or an application that exposes metrics that can be scraped should be considered a prerequisite. This article builds on my previous article Spring Boot Prometheus Disk-Space Metrics, in which custom disk-space metrics were exposed by a Spring Boot application.

In addition for Prometheus to be able to scrape metrics, the IP address of the computer on which the application to be scraped is running is also needed. An address like 127.0.0.1 or localhost will not work. The IP address can be obtained using, for instance, the ifconfig command. The IP address I will use in this example is 192.168.1.104.

Prometheus Configuration

Prometheus and Alertmanager will be run in Docker containers with external configuration files. First the Prometheus configuration file and the directory structure in which the files will be stored is to be created:

  • Create a directory named “prometheus”.
  • In the “prometheus” directory, create another directory named “config”.
  • In the “prometheus” directory, create another directory named “rules”.
  • In the “config” directory, create a file named “prometheus.yml” with the following contents:
# To reload this configuration and any rule configurations referenced by this configuration
# do a POST to http://localhost:9090/-/reload.
# In order for on-demand configuration reloading to work, the --web.enable-lifecycle
# option must be specified when launching Prometheus.
global:
  scrape_interval: 10s

rule_files:
  - "/etc/prometheus/rules/*.rules"

scrape_configs:
  - job_name: 'spring_boot_actuator_metrics'
    metrics_path: '/actuator/prometheus'
    scrape_interval: 10s
    static_configs:
      # The IP address below must be replaced with the current IP address of the computer
      # on which the REST Example application is running.
      # Note that "localhost" or "127.0.0.1" does not work!
      - targets: ['192.168.1.104:8080']

alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - alert-manager:9093

Note that:

  • There is a comment in the configuration file on how to enable dynamic reloading of the configuration and associated rule configurations.
  • Rules-files to be loaded are specified using a wildcard (*.rules) at the rule_files key.
    This allows for adding new rules-files without having to change this configuration as long as the new rules-files have the “.rules” postfix.
  • There are two scrape_interval keys.
    The first one appears under the global key and the second one appears under the first job under the scrape_configs key.
    The global one specifies the default scraping interval.
    The one under the scrape_configs specifies the scraping interval of the job in question – this overrides the default scraping intervals.
  • The scrape_configs key contains a number of scraping configurations specifying from where to scrape data.
  • The metrics_path value of the job specifies the HTTP path at which data will be scraped.
  • The IP-address and port number in the targets list specifies the address and port at which data will be scraped.
    In the above configuration the IP-address is 192.168.1.104 and the port number is 8080. Note that IP-address will need to be adapted to suit your environment.
  • The URL at which data will be scraped will, given the above configuration, be 192.168.1.104:8080/actuator/prometheus.
    The URL is the product of the IP-address, port and metrics path.
  • The alert-manager configuration is located under the alertmanagers key under the alerting key.
  • The static_configs contains one single target, alert-manager:9093.
    The host “alert-manager” is the name of the service that later will be defined in a Docker compose file later in this article.

The Rule

In this article, there will be just one single alerting rule – a rule that will be triggered when free disk-space reaches below a certain percentage for a minimum duration of 10 seconds.

  • In the “rules” directory in the “prometheus” directory, create a file named “diskspace.rules” with the following contents:
groups:
  - name: example-group
    rules:
      - alert: DiskspaceLow
        expr: remaining_disk_space_percent > 58.2
        for: 10s
        labels:
          severity: critical
        annotations:
          description: "Diskspace is low!"

Note that:

  • There is a rule-group named “example-group”.
    The rule-group contains one single rule – DiskspaceLow.
  • The expression that, when evaluated to false, will trigger the alert is:
    remaining_disk_space_percent > 58.2
    The remaining_disk_space_percent is the custom disk-space metric that was defined for the Spring Boot application in the previous article.
  • The for key specifies a time duration during which the expression must be evaluated to false before an alert will be fired.
    In this example, the remaining disk space must be less than 58.2% for 10 consecutive seconds before an alert is fired. The number 58.2% will probably need to be adjusted to be close to the percentage of remaining disk space on your system.
  • The labels key can be used to add one or more labels to alerts fired from the rule.
    This will be seen below when a rule triggers an alarm.
  • The annotations key can be used to add a description to the rule.
    In this example the description is “Diskspace is low!”.

Prometheus and Alertmanager Docker Compose

In order to conveniently run Prometheus and the Prometheus Alertmanager I have used Docker Compose.

  • In the “prometheus” directory, create a file named “docker-compose.yml” with the following contents:
version: "3"

services:
  prometheus:
    image: prom/prometheus:latest
    ports:
    - 9090:9090
    command: [
      "--config.file=/etc/prometheus/config/prometheus.yml",
      "--storage.tsdb.path=/prometheus",
      "--web.console.libraries=/usr/share/prometheus/console_libraries",
      "--web.console.templates=/usr/share/prometheus/consoles",
      "--web.enable-lifecycle"
    ]
    volumes:
    - "./config/:/etc/prometheus/config/"
    - "./rules/:/etc/prometheus/rules/"
    networks:
      - prometheusnet

  alert-manager:
    image: prom/alertmanager:latest
    depends_on:
      - prometheus
    ports:
      - 9093:9093
    networks:
      - prometheusnet

networks:
  prometheusnet:

Note that:

  • The first service is the Prometheus service.
  • The config and rules directories in the project are mapped into the container of the Prometheus service, as can be seen in the volumes section.
  • The Prometheus service is started with a number of configurations, as can be seen under the command section.
    –config.file specifies the location of the Prometheus configuration file.
    –storage.tsdb.path specifies the location of the Prometheus database.
    –web.console.libraries specifies the location of a directory containing console template libraries.
    –web.console.templates specifies the location of a directory containing console templates.
    –web.enable-lifecycle enables dynamic reloading of Prometheus configuration and rules by sending a POST request to http://localhost:9090/-/reload.
  • The second service is the Alertmanager service.
  • Prometheus and Alertmanager communicate over a Docker network named “prometheusnet”.

Start the Prometheus and Alertmanager containers by opening a terminal window and entering the prometheus directory of the project and then issue the following command:

docker-compose up

To verify that Prometheus is up and running, open the URL http://localhost:9090 in a browser. The Prometheus web GUI should appear.

View Remaining Disk-Space in Prometheus

Before being able to view any metrics in Prometheus, the application which supplies the metrics to be scraped need to be started. In this article I have used an example Spring Boot application that implements a simple REST server. The application can be started by running it in your development environment (Eclipse, IntelliJ etc) or using Maven to start it.

mvn spring-boot:run

Let the application run for a minute or two and then open the URL http://localhost:9090/graph?g0.expr=remaining_disk_space_percent&g0.tab=0&g0.stacked=0&g0.range_input=5m to view a graph showing the remaining disk-space percentage. My graph looks like this:

Remaining disk-space percentage graph in Prometheus.

In my case the free disk-space is just above 37%. Note the percentage that applies to you.

Modify Disk-Space Rules

To examine the single rule that has been configured in this example, open the URL http://localhost:9090/alerts to list all the rules and their statuses. Click the arrow next to the DiskspaceLow rule to view rule details.


Disk-space low rule viewed in the Prometheus web GUI.

It can be seen that the rule information matches that what was earlier entered in the rule file.

In my case, the percentage in the rule is not suitable if I want to be able to trigger the rule. The free disk-space percentage in the rule can be adjusted as follows:

  • Modify the rule in the diskspace.rule file.
    The expression should, given that my free disk-space is just above 37%, be:
    expr: remaining_disk_space_percent > 37.0
  • Issue a POST request to the URL http://localhost:9090/-/reload
    Recall that Prometheus is started with the –web.enable-lifecycle flag which allows for dynamic reloading of Prometheus configuration and rules.
  • Reload the http://localhost:9090/alerts page and click the arrow next to the DiskspaceLow rule.
    The percentage should have been changed to the new value entered in the rule file.
Disk-space low rule after having adjusted the percentage in the expression.

Trigger Alarm

With the rule modified and reloaded, I now copy a few large files, like for instance an Ubuntu DVD image, to the drive on which the Spring Boot application is running. When the free disk-space percentage goes below the percentage configured in the rule, an alarm is triggered.
Open or refresh the URL http://localhost:9090/alerts . The list of rules should now look similar to what can be seen below.

Triggered disk-space low rule viewed in the Prometheus web GUI.

Note that:

  • The DiskspaceLow rule has been triggered and is firing an alarm.
  • A section below the rule showing information about the firing rule is now present.
  • There are a number of labels associated with the fired alert.
    The labels are:
    alertname=DiskspaceLow,
    instance=192.168.1.104:8080,
    job=spring_boot_actuator_metrics,
    severity=critical
  • The alertname=DiskspaceLow label is from the alert: DiskspaceLow in the rule.
  • The instance label is the IP address and port from which the metrics that triggered the alert were scraped.
  • The job=spring_boot_actuator_metrics label is the name of the scraping job that scraped the metrics that caused the alert to be fired.
  • The last label, severity=critical, is the label that were added in the rule.

This article is concluded with the above alert but of course there is more to alerting with Prometheus. Please refer to the Prometheus documentation on alerting for further information.

Happy coding!

Leave a Reply

Your email address will not be published.