Alerting with the ELK Stack and Elastalert

By | December 6, 2015

Based on my article on JMX Monitoring with the ELK Stack and the article on creating a Docker image with Elastalert, I will now combine these and add the missing part, alerting, to the monitoring and alerting stack I have worked my way towards.

Preparations

The different configuration files used in this article’s example available on GitHub, however there are a few adaptions you need to know before trying it out on your own computer.

  • In the MuleShared/conf/wrapper.conf file modify the line that looks like this:
    wrapper.java.additional.20=-Djava.rmi.server.hostname=192.168.99.100
    Replace the IP address with the IP address of obtained from the command:
    docker-machine ip [machine name]
    If you are using a Linux operating system that does not require a Docker VM, enter the IP address 127.0.0.1.
  • Give read and write permissions to everyone on the root folder and all underlying files and folders containing the files from GitHub.
    In a Unix/Linux environment, use the chmod command:
    chmod -R +rw alerting-with-elk-and-elastalert
  • Windows and Mac users only:
    Make sure you place the root directory containing the configuration files in a location that is accessible by the Docker daemon. For OS X this is /Users/ and subdirectories, for Windows it is C:/Users and subdirectories. Docker documentation reference.
  • Examine the file mule-config.xml located in the MuleShared/apps/mule-perpetuum-mobile directory.
    On line 28 or thereabout, there is the repeatInterval of the <quartz:inbound-endpoint> which should have the value 20. We will lower this value later, to trigger the Mule flow more often which will result in higher CPU usage and, hopefully, an alert going off.

Also make sure that you have Docker 1.9 or later installed, as the official Docker images I have used in this article prefer at least version 1.9 of Docker, and Docker Compose. If you are using Mac OS X or Windows, I suggest Docker Toolbox.

Overview

The following graphics shows the Docker images used in this article:

Docker images used in the article and significant relationships to other Docker images.

Docker images used in the article and significant relationships to other Docker images.

I have created Docker images that wrap the official Docker images for the ELK stack, that is Elasticsearch, Kibana and Logstash. The reasons differ slightly depending on the image:

  • Elasticsearch
    Install additional plug-ins.
  • Logstash
    At startup, wait for Elasticsearch to become available before starting Logstash.
    Also install the JMX plug-in.
  • Kibana
    At startup, wait for Elasticsearch to become available before starting Kibana.
    Install the Sense plug-in.

Logstash, Kibana and Elastalert all need to wait for Elasticsearch to become available before starting up since I use Docker Compose to start all the Docker containers and thus cannot determine the order in which the different containers will start.
The Mule ESB CE instance is running a Mule application which only purpose is to generate events. For details, please refer to my earlier article on JMX Monitoring with Docker and the ELK Stack.

Elastalert Rule

The Elastalert rule that will be used in this example is located in ElastalertShared/rules/cpu-spike.yaml and looks like this:

# Example Elastalert rule that will alert on spikes in the CPU load of
# the monitored Mule CE ESB.
name: CPU spike
type: spike
index: logstash-*
threshold: 1
timeframe:
    minutes: 1
spike_height: 2
spike_type: "up"

filter:
- range:
    cpuLoad:
        from: 3.0
        to: 100.0

alert:
- "debug"

This is a Elastalert rule of the spike type that will “send” an alert to the debug log when the number of cpuLoad events in the range 3.0 to 100.0 during the last minute is at least two times higher than the number of cpuLoad events in the same range during the minute before the last minute.
For further explanations of Elastalert and Elastalert rules, please refer to their excellent documentation.

Running the Example

Starting the Docker containers used in this example is accomplished using Docker Compose.

  • Non-Linux operating systems only: Start Docker.
    The easiest way is to use the Docker Quickstart Terminal if you have it installed.
    Otherwise use Docker Machine, in which case you also need to ssh into the Docker virtual machine. Commands to use with Docker Machine are:

    docker-machine start [name of machine here, possibly 'default']
    docker-machine ssh [name of machine here, possibly 'default']
  • Linux: Open a terminal window.
  • Go to the root directory containing the example files.
    This is the directory that contains the docker-compose.yml file along with the ElastalertShared, ElasticsearchShared etc directories.
  • Start the Docker containers.
    docker-compose up

There will be a lot of console output, perhaps even some errors, and after some time output like this should appear, which tells us that Logstash is sending JMX data obtained from the Mule image to Elasticsearch.

logstash_1      | {
logstash_1      |                "@version" => "1",
logstash_1      |              "@timestamp" => "2015-12-06T14:29:32.013Z",
logstash_1      |                    "host" => "muleserver",
logstash_1      |                    "path" => "/config-dir/jmx",
logstash_1      |                    "type" => "jmx",
logstash_1      |             "metric_path" => "mule_jvm.OperatingSystem.SystemCpuLoad",
logstash_1      |     "metric_value_number" => 0.004539722572509458,
logstash_1      |                 "cpuLoad" => 0.45397225725094575
logstash_1      | }

About once every minute log similar to this should also appear:

elastalert_1    | 2015-12-06 14:32:26,079 DEBG 'elastalert' stderr output:
elastalert_1    | INFO:elastalert:Queried rule CPU spike from 12-6 14:30 UTC to 12-6 14:32 UTC: 0 hits
elastalert_1    | 
elastalert_1    | 2015-12-06 14:32:26,079 DEBG 'elastalert' stdout output:
elastalert_1    | Queried rule CPU spike from 12-6 14:30 UTC to 12-6 14:32 UTC: 0 hits
elastalert_1    | 
elastalert_1    | 2015-12-06 14:32:26,250 DEBG 'elastalert' stderr output:
elastalert_1    | INFO:elastalert:Ran CPU spike from 12-6 14:30 UTC to 12-6 14:32 UTC: 0 query hits, 0 matches, 0 alerts sent
elastalert_1    | 
elastalert_1    | 2015-12-06 14:32:26,250 DEBG 'elastalert' stdout output:
elastalert_1    | Ran CPU spike from 12-6 14:30 UTC to 12-6 14:32 UTC: 0 query hits, 0 matches, 0 alerts sent
elastalert_1    | 
elastalert_1    | 2015-12-06 14:32:26,252 DEBG 'elastalert' stderr output:
elastalert_1    | INFO:elastalert:Sleeping for 59 seconds
elastalert_1    | 
elastalert_1    | 2015-12-06 14:32:26,252 DEBG 'elastalert' stdout output:
elastalert_1    | Sleeping for 59 seconds
elastalert_1    |

This is Elastalert querying Elasticsearch to determine if our rule ‘CPU Spike’ warrants an alert. Note that you will have to wait at least two minutes before a basline rate has been established for our spike-rule.
Having waited at least two minutes, we are now ready to cause a CPU spike but increasing the frequency of event generation in the Mule flow:

  • Open the file mule-config.xml in MuleShared/apps/mule-perpetuum-mobile.
    This is the Mule flow in that file that generates periodic events:

        <flow name="eventGeneratingFlow">
            <quartz:inbound-endpoint
                    jobName="eventGeneratingJob"
                    repeatInterval="20"
                    repeatCount="-1"
                    connector-ref="oneThreadQuartzConnector">
                <quartz:event-generator-job>
                    <quartz:payload>go</quartz:payload>
                </quartz:event-generator-job>
            </quartz:inbound-endpoint>
    
            <logger level="ERROR" message="Generated an event!"/>
    
            <vm:outbound-endpoint path="eventReceiverEndpoint" exchange-pattern="one-way"/>
        </flow>
  • Modify the value of the repeatInterval attribute in the <quartz:inbound-endpoint> element so that it reads 1 instead of 20.
  • Save the file.
  • In the mule.log file located in MuleShared/logs notice how the Mule application was reloaded as we modified its configuration file:
    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    + Redeploying artifact 'mule-perpetuum-mobile'             +
    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  • Wait for a couple of minutes until an alert from Elastalert appears in the log:
    elastalert_1    | 2015-12-06 15:08:11,513 DEBG 'elastalert' stderr output:
    elastalert_1    | INFO:elastalert:Queried rule CPU spike from 12-6 15:06 UTC to 12-6 15:08 UTC: 10 hits
    elastalert_1    | 
    elastalert_1    | 2015-12-06 15:08:11,514 DEBG 'elastalert' stdout output:
    elastalert_1    | Queried rule CPU spike from 12-6 15:06 UTC to 12-6 15:08 UTC: 10 hits
    elastalert_1    | 
    elastalert_1    | 2015-12-06 15:08:11,533 DEBG 'elastalert' stderr output:
    elastalert_1    | INFO:elastalert:Alert for CPU spike at 2015-12-06T15:07:17.152Z:
    elastalert_1    | 
    elastalert_1    | 2015-12-06 15:08:11,534 DEBG 'elastalert' stdout output:
    elastalert_1    | Alert for CPU spike at 2015-12-06T15:07:17.152Z:
    elastalert_1    | CPU spike
    elastalert_1    | 
    elastalert_1    | An abnormal number (5) of events occurred around 12-6 15:07 UTC.
    elastalert_1    | Preceding that time, there were only 1 events within 0:01:00
    elastalert_1    | 
    elastalert_1    | @timestamp: 2015-12-06T15:07:17.152Z
    elastalert_1    | @version: 1
    elastalert_1    | cpuLoad: 10.7221573966
    elastalert_1    | host: muleserver
    elastalert_1    | metric_path: mule_jvm.OperatingSystem.SystemCpuLoad
    elastalert_1    | metric_value_number: 0.107221573966
    elastalert_1    | path: /config-dir/jmx
    elastalert_1    | reference_count: 1
    elastalert_1    | spike_count: 5
    elastalert_1    | type: jmx

We have successfully caused the CPU Spike rule to trigger and send an alert!

My monitoring and alerting stack is now completed and this also concludes this article. There is a lot to write about in this area so I suspect that I may return with future articles in which I use this monitoring and alerting stack. Stay tuned and happy coding!

7 thoughts on “Alerting with the ELK Stack and Elastalert

  1. Sunil Chaudhari

    Hi,
    I have elasticsearch installed and running in environment.
    I am installing elastealert using command
    $python setup.py install >> install.log
    It gives me error,

    Processing blist-1.3.6.tar.gz
    Running blist-1.3.6/setup.py -q bdist_egg –dist-dir /tmp/easy_install-SvKvZs/blist-1.3.6/egg-dist-tmp-R2pZmQ
    warning: no files found matching ‘blist.rst’

    Reply
  2. Sunil Chaudhari

    Hi,
    Thanks for reply.
    I have referred docker file and few of the steps are done. However
    When I execute “python setup.py install” it gives me error,
    mock requires setuptools>=17.1. Aborting installation
    error: Setup script exited with 1

    what is this about?

    br,
    Sunil

    Reply
    1. Ivan Krizsan Post author

      Hi!
      I would suggest that you talk to the developers since I regretfully lack deeper knowledge about Elastalert.
      On the (Elastalert) project’s GitHub page there is a link to a Gitter chat where people can ask questions.
      Best wishes!

      Reply
  3. Pingback: 使用elastalert進行錯誤報警 | 程式前沿

Leave a Reply

Your email address will not be published. Required fields are marked *