Prometheus: Using the blackbox exporter

(2016-01-01)

Up until recently, I used to use kanla, a simple alerting program that I wrote 4 years ago. Back then, delivering alerts via XMPP (Jabber) to mobile devices like Android smartphones seemed like the best course of action.

About a year ago, I’ve started using Prometheus for collecting monitoring data and alerting based on that data. See „Monitoring mit Prometheus“, my presentation about the topic at GPN, for more details and my experiences.

Motivation to switch to the Blackbox Exporter

Given that the Prometheus Alertmanager is already configured to deliver alerts to my mobile device, it seemed silly to rely on two entirely different mechanisms. Personally, I’m using Pushover, but Alertmanager integrates with many popular providers, and it’s easy to add another one.

Originally, I considered extending kanla in such a way that it would talk to Alertmanager, but then I realized that the Prometheus Blackbox Exporter is actually a better fit: it’s under active development and any features that are added to it benefit a larger number of people than the small handful of kanla users.

Hence, I switched from having kanla probe my services to having the Blackbox Exporter probe my services. The rest of the article outlines my configuration in the hope that it’s useful for others who are in a similar situation.

I’m assuming that you are already somewhat familiar with Prometheus and just aren’t using the Blackbox Exporter yet.

Blackbox Exporter: HTTP

The first service I wanted to probe is Debian Code Search. The following blackbox.yml configuration file defines a module called “dcs_results” which, when called, downloads the specified URL via a HTTP GET request. The probe is considered failed when the download doesn’t finish within the timeout of 5 seconds, or when the resulting HTTP body does not contain the text “load_font”.

modules:
  dcs_results:
    prober: http
    timeout: 5s
    http:
      fail_if_not_matches_regexp:
      - "load_font"

In my prometheus.conf, this is how I invoke the probe:

- job_name: blackbox_dcs_results
  scrape_interval: 60s
  metrics_path: /probe
  params:
    module: [dcs_results]
    target: ['http://codesearch.debian.net/search?q=i3Font']
  scheme: http
  target_groups:
  - targets:
    - blackbox-exporter:9115

As you can see, the search query is “i3Font”, and I know that “load_font” is one of the results. In case Debian Code Search does not deliver the expected search results, I know something is seriously broken. To make Prometheus actually generate an alert when this probe fails, we need an alert definition like the following:

ALERT ProbeFailing
  IF probe_success < 1
  FOR 15m
  WITH {
    job="blackbox_exporter"
  }
  SUMMARY "probe {{$labels.job}} failing"
  DESCRIPTION "probe {{$labels.job}} failing"

Blackbox Exporter: IRC

With the TCP probe module’s query/response feature, we can configure a module that verifies an IRC server lets us log in:

modules:
  irc_banner:
    prober: tcp
    timeout: 5s
    tcp:
      query_response:
      - send: "NICK prober"
      - send: "USER prober prober prober :prober"
      - expect: "PING :([^ ]+)"
        send: "PONG ${1}"
      - expect: "^:[^ ]+ 001"

Blackbox Exporter: Git

The query/response feature can also be used for slightly more complex protocols. To verify a Git server is available, we can use the following configuration:

modules:
  git_code_i3wm_org:
    prober: tcp
    timeout: 5s
    tcp:
      query_response:
      - send: "002bgit-upload-pack /i3\x00host=code.i3wm.org\x00"
      - expect: "^[0-9a-f]+ HEAD\x00"

Note that the first characters are the ASCII-encoded hex length of the entire line:

$ echo -en '0000git-upload-pack /i3\x00host=code.i3wm.org\x00' | wc -c
43
$ perl -E 'say sprintf("%04x", 43)'
002b

The corresponding git URL for the example above is git://code.i3wm.org/i3. You can read more about the git protocol at Documentation/technical/pack-protocol.txt.

Blackbox Exporter: Meta-monitoring

Don’t forget to add an alert that will fire if the blackbox exporter is not available:

ALERT BlackboxExporterDown
  IF count(up{job="blackbox_dcs_results"} == 1) < 1
  FOR 15m
  WITH {
    job="blackbox_meta"
  }
  SUMMARY "blackbox-exporter is not up"
  DESCRIPTION "blackbox-exporter is not up"