Prometheus Alertmanager meta-monitoring

(2015-12-02)

I’m happily using Prometheus for monitoring and alerting since about a year.

Regardless of the monitoring system, one problem that I was uncertain of how to solve it in a good way used to be meta-monitoring: if you have a monitoring system, how do you know that the monitoring system itself is running? You’ll need another level of monitoring/alerting (hence “meta-monitoring”).

Recently, I realized that I could use Gmail for meta-monitoring: Google Apps Script allows users to run JavaScript code periodically that has access to Gmail and other Google apps. That way, I can have a cronjob which looks for emails from my monitoring/alerting infrastructure, and if there are none for 2 days, I get an alert email from that script.

That’s a rather simple way of having an entirely different layer of monitoring code, so that the two monitoring systems don’t suffer from a common bug. Further, the code is running on Google servers, so hardware failures of my monitoring system don’t affect it.

The rest of this article walks you through the setup, assuming you’re already using Prometheus, Alertmanager and Gmail.

Installing the meta-monitoring Google Apps Script

See the “Your first script” instructions for how to create a new Google Apps Script file. Then, use the following code, of course replacing the email addresses of your Alertmanager instance and your own email address:

// vim:ts=2:sw=2:et:ft=javascript
// Licensed under the Apache 2 license.
// © 2015 Google Inc.

// Runs every day between 07:00 and 08:00 in the local time zone.
function checkAlertmanager() {
  // Find all matching email threads within the last 2 days.
  // Should result in 2 threads, unless something went wrong.
  var search_atoms = [
    'from:[email protected]',
    'subject:daily alert test',
    'newer_than:2d',
  ];
  var threads = GmailApp.search(search_atoms.join(' '));
  if (threads.length === 0) {
    GmailApp.sendEmail(
      '[email protected]',
      'ALERT: alertmanager test mail is missing for 2d',
      'Subject says it all');
  }
}

In the menu, select “Resources → Current project’s triggers”. Click “Add a new trigger”, select “Time-driven”, “Day timer” and set the time to “7am to 8am”. This will make script run every day between 07:00 and 08:00. The time doesn’t really matter, but you need to specify something. I went for the 07:00-08:00 timespan because that’s shortly before I typically get up, so likely I’ll be presented with the freshest results just when I get up.

You can now either wait a day for the trigger to fire, or you can select the checkAlertmanager function in the “Run” menu to run it right away. You should end up with an email in your inbox, notifying you that the daily alert test is missing, which is expected since we did not configure it yet :).

Configuring a daily test alert in Prometheus

Create a file called dailytest.rules with the following content:

ALERT DailyTest
  IF vector(1) > 0
  FOR 1m
  LABELS {
    job = "dailytest",
  }
  ANNOTATIONS {
    summary = "daily alert test",
    description = "daily alert test",
  }

Then, include it in your Prometheus config’s rules section. After restarting Prometheus or sending it a SIGHUP signal, you should see the new alert on the /alerts status page:

prometheus daily alert

Configuring Alertmanager

In your Alertmanager configuration, you’ll need to specify where that alert should be delivered to and how often it should repeat. I suggest you add a notification_config that you’ll use specifically for the daily alert test and nothing else, so that you never accidentally change something:

route:
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 30s
  repeat_interval: 1h
  receiver: team-X-pager

  routes:
  - match:
      job: dailytest
    receiver: dailytest
    repeat_interval: 1d

receivers:
- name: 'dailytest'
  email_configs:
  - to: '[email protected]'

Send Alertmanager a SIGHUP signal to make it reload its configuration file. After Prometheus has been running for a minute, you should see the following alert on your Alertmanager’s /alerts status page:

prometheus alertmanager alert

Adding a Gmail filter to hide daily test alerts

Finally, once you verified everything is working, add a filter so that the daily test alerts don’t clutter your Gmail inbox: put “from:([email protected]) subject:(DailyTest)” into the search box, click the drop-down icon, click “Create filter with this search”, select “Skip the Inbox”.

gmail filter screenshot