Add initial documentation for the alertmanager.

Add a new section for alerting so it's all together, and move the alerting rules over there.

Add initial documentation for the alertmanager.
Add a new section for alerting so it's all together, and move the alerting rules over there.
067ef4a4 · Brian Brazil · 615c1598 · 067ef4a4 · 067ef4a4 · 067ef4a4
Commit 067ef4a4 authored Jun 01, 2015 by Brian Brazil
7 changed files
--- a/content/docs/alerting/alertmanager.md
+++ b/content/docs/alerting/alertmanager.md
+---
+title: Alertmanager
+sort_rank: 2
+nav_icon: sliders
+---
+
+# Alertmanager
+
+The Alertmanager receives alerts from one or more Prometheus servers.
+It manages those alerts, including silencing, inhibition, aggregation and
+sending out notifications via methods such as email, PagerDuty and HipChat.
+
+**WARNING: The Alertmanager is still considered to be very experimental.**
+
+## Configuration
+
+The Alertmanager is configured via command-line flags and a configuration file.
+
+The configuration file is an ASCII protocol buffer. To specify which
+configuration file to load, use the `-config.file` flag.
+
+```
+./alertmanager -config.file alertmanager.conf
+```
+
+To send all alerts to email, set the `-notification.smtp.smarthost` flag to
+an SMTP smarthost (such as a [Postfix null client](http://www.postfix.org/STANDARD_CONFIGURATION_README.html#null_client)) 
+and use the following configuration:
+
+```
+notification_config {
+  name: "alertmanager_test"
+  email_config {
+    email: "test@example.org"
+  }
+}
+
+aggregation_rule {
+  notification_config_name: "alertmanager_test"
+}
+```
+
+### Filtering
+
+An aggregation rule can be made to apply to only some alerts using a filter.
+
+For example, to apply a rule only to alerts with a `severity` label with the value `page`:
+
+```
+aggregation_rule {
+  filter {
+    name_re: "severity"
+    value_re: "page"
+  }
+  notification_config_name: "alertmanager_test"
+}
+```
+
+Multiple filters can be provided.
+
+### Repeat Rate
+By default an aggregation rule will repeat notifications every 2 hours. This can be changed using `repeat_rate_seconds`.
+
+```
+aggregation_rule {
+  repeat_rate_seconds: 3600
+  notification_config_name: "alertmanager_test"
+}
+```
+
+### Notifications
+
+The Alertmanager has support for a growing number of notification methods.
+Multiple notifications methods of one or more types can be used in the same
+notification config.
+
+The `send_resolved` field can be used with all notification methods to enable or disable
+sending notifications that an alert has stopped firing.
+
+#### Email
+
+The `-notification.smtp.smarthost` flag must be set to an SMTP smarthost.
+The `-notification.smtp.sender` flag may be set to change the default From address.
+
+```
+notification_config {
+  name: "alertmanager_email"
+  email_config {
+    email: "test@example.org"
+  }
+  email_config {
+    email: "foo@example.org"
+  }
+}
+```
+
+Plain and CRAM-MD5 SMTP authentication methods are supported.
+The `SMTP_AUTH_USERNAME`, `SMTP_AUTH_SECRET`, `SMTP_AUTH_PASSWORD` and
+`SMTP_AUTH_IDENTITY` environment variables are used to configure them.
+
+#### PagerDuty
+
+The Alertmanager integrates as a [Generic API
+Service](https://support.pagerduty.com/hc/en-us/articles/202830340-Creating-a-Generic-API-Service)
+with PagerDuty.
+
+```
+notification_config {
+  name: "alertmanager_pagerduty"
+  pagerduty_config {
+    service_key: "supersecretapikey"
+  }
+}
+```
+
+#### Pushover
+```
+notification_config {
+  name: "alertmanager_pushover"
+  pushover_config {
+    token: "mypushovertoken"
+    user_key: "mypushoverkey"
+  }
+}
+```
+
+#### HipChat
+```
+notification_config {
+  name: "alertmanager_hipchat"
+  hipchat_config {
+    auth_token: "hipchatauthtoken"
+    room_id: 123456
+  }
+}
+```
+
+#### Slack
+```
+notification_config {
+  name: "alertmanager_slack"
+  slack_config {
+    webhook_url: "webhookurl"
+    channel: "channelname"
+  }
+}
+```
+
+#### Flowdock
+
+```
+notification_config {
+  name: "alertmanager_flowdock"
+  flowdock_config {
+    api_token: "4c7234902348234902384234234cdb59"
+    from_address: "aliaswithgravatar@example.com"
+    tag: "monitoring"
+  }
+}
+```
+
+#### Generic Webhook
+
+The Alertmanager supports sending notifications as JSON to arbitrary
+URLs. This could be used to perform automated actions when an
+alert fires or integrate with a system that the Alertmanager does not support.
+
+```
+notification_config {
+  name: "alertmanager_webhook"
+  webhook_config {
+    url: "http://example.org/my/hook"
+  }
+}
+```
+
+An example of JSON message it sends is below.
+
+```json
+{
+   "version": "1",
+   "status": "firing",
+   "alert": [
+      {
+         "summary": "summary",
+         "description": "description",
+         "labels": {
+            "alertname": "TestAlert"
+         },
+         "payload": {
+            "activeSince": "2015-06-01T12:55:47.356+01:00",
+            "alertingRule": "ALERT TestAlert IF absent(metric_name) FOR 0y WITH ",
+            "generatorURL": "http://localhost:9090/graph#%5B%7B%22expr%22%3A%22absent%28metric_name%29%22%2C%22tab%22%3A0%7D%5D",
+            "value": "1"
+         }
+      }
+   ]
+}
+```
+
+This format is subject to change.
--- a/content/docs/alerting/index.md
+++ b/content/docs/alerting/index.md
+---
+title: Alerting
+sort_rank: 7
+nav_icon: bell-o
+---
--- a/content/docs/alerting/overview.md
+++ b/content/docs/alerting/overview.md
+---
+title: Alerting Overview
+sort_rank: 1
+nav_icon: sliders
+---
+
+# Alerting Overview
+
+Alerting with Prometheus is separated into two parts. Alerting rules in
+Prometheus servers send alerts to an Alertmanager. The Alertmanager then
+manages those alerts, including silencing, inhibition, aggregation and sending
+out notifications via methods such as email, PagerDuty and HipChat.
+
+**WARNING: The Alertmanager is still considered to be very experimental.**
+
+The main steps to setting up alerting and notifications are:
+
+* Setup and configure the Alertmanager
+* Configure Prometheus to talk to the Alertmanager with the `-alertmanager.url` flag
+* Create alerting rules in Prometheus
--- a/content/docs/alerting/rules.md
+++ b/content/docs/alerting/rules.md
+---
+title: Alerting rules
+sort_rank: 3
+---
+
+# Alerting rules
+
+Alerting rules allow you to define alert conditions based on Prometheus
+expression language expressions and to send notifications about firing alerts
+to an external service. Whenever the alert expression results in one or more
+vector elements at a given point in time, the alert counts as active for these
+elements' label sets.
+
+Alerting rules are configured in Prometheus in the same way as [recording
+rules](../../querying/rules).
+
+### Defining alerting rules
+Alerting rules are defined in the following syntax:
+
+    ALERT <alert name>
+      IF <expression>
+      [FOR <duration>]
+      WITH <label set>
+      SUMMARY "<summary template>"
+      DESCRIPTION "<description template>"
+
+The optional `FOR` clause causes Prometheus to wait for a certain duration
+between first encountering a new expression output vector element (like an
+instance with a high HTTP error rate) and counting an alert as firing for this
+element. Elements that are active, but not firing yet, are in pending state.
+
+The `WITH` clause allows specifying a set of additional labels to be attached
+to the alert. Any existing conflicting labels will be overwritten.
+
+The `SUMMARY` should be a short, human-readable summary of the alert (suitable
+for e.g. an email subject line), while the `DESCRIPTION` clause should provide
+a longer description. Both string fields allow the inclusion of template
+variables derived from the firing vector elements of the alert:
+
+    # To insert a firing element's label values:
+    {{$labels.<labelname>}}
+    # To insert the numeric expression value of the firing element:
+    {{$value}}
+
+Examples:
+
+    # Alert for any instance that is unreachable for >5 minutes.
+    ALERT InstanceDown
+      IF up == 0
+      FOR 5m
+      WITH {
+        severity="page"
+      }
+      SUMMARY "Instance {{$labels.instance}} down"
+      DESCRIPTION "{{$labels.instance}} of job {{$labels.job}} has been down for more than 5 minutes."
+
+    # Alert for any instance that have a median request latency >1s.
+    ALERT ApiHighRequestLatency
+      IF api_http_request_latencies_ms{quantile="0.5"} > 1000
+      FOR 1m
+      WITH {}
+      SUMMARY "High request latency on {{$labels.instance}}"
+      DESCRIPTION "{{$labels.instance}} has a median request latency above 1s (current value: {{$value}})"
+
+### Inspecting alerts during runtime
+To manually inspect which alerts are active (pending or firing), navigate to
+the "Alerts" tab of your Prometheus instance. This will show you the exact
+label sets for which each defined alert is currently active.
+
+For pending and firing alerts, Prometheus also stores synthetic time series of
+the form `ALERTS{alertname="<alert name>", alertstate="pending|firing", <additional alert labels>}`.
+The sample value is set to `1` as long as the alert is in the indicated active
+(pending or firing) state, and a single `0` value gets written out when an alert
+transitions from active to inactive state. Once inactive, the time series does
+not get further updates.
+
+### Sending alert notifications
+Prometheus's alerting rules are good at figuring what is broken *right now*,
+but they are not a fully-fledged notification solution. Another layer is needed
+to add summarization, notification rate limiting, silencing and alert
+dependencies on top of the simple alert definitions. In Prometheus's ecosystem,
+the [Alertmanager](../alertmanager) takes on this
+role. Thus, Prometheus may be configured to periodically send information about
+alert states to an Alertmanager instance, which then takes care of dispatching
+the right notifications. The Alertmanager instance may be configured via the
+`-alertmanager.url` command line flag.
--- a/content/docs/introduction/overview.md
+++ b/content/docs/introduction/overview.md
@@ -38,7 +38,7 @@ optional:
 * a [push gateway](https://github.com/prometheus/pushgateway) for supporting short-lived jobs
 * a [GUI-based dashboard builder](/docs/visualization/promdash/) based on Rails/SQL
 * special-purpose [exporters](/docs/instrumenting/exporters/) (for HAProxy, StatsD, Ganglia, etc.)
-* an (experimental) [alert manager](https://github.com/prometheus/alertmanager)
+* an (experimental) [alertmanager](https://github.com/prometheus/alertmanager)
 * a [command-line querying tool](https://github.com/prometheus/prometheus_cli)
 * various support tools


--- a/content/docs/practices/index.md
+++ b/content/docs/practices/index.md
 ---
 title: Best practices
-sort_rank: 7
+sort_rank: 8
 nav_icon: thumbs-o-up
 ---
--- a/content/docs/querying/rules.md
+++ b/content/docs/querying/rules.md
 ---
-title: Recording and alerting rules
+title: Recording rules
 sort_rank: 6
 ---

-# Defining recording and alerting rules
+# Defining recording rules

 ## Configuring rules
 Prometheus supports two types of rules which may be configured and then
-evaluated at regular intervals: recording rules and alerting rules. To include
-rules in Prometheus, create a file containing the necessary rule statements and
-have Prometheus load the file via the `rule_files` field in the [Prometheus
-configuration](/docs/operating/configuration).
+evaluated at regular intervals: recording rules and [alerting
+rules](../../alerting/rules). To include rules in Prometheus, create a file
+containing the necessary rule statements and have Prometheus load the file via
+the `rule_files` field in the [Prometheus configuration](/docs/operating/configuration).

 The rule files can be reloaded at runtime by sending `SIGHUP` to the Prometheus
 process. The changes are only applied if all rule files are well-formatted.
@@ -62,81 +62,3 @@ evaluation cycle, the right-hand-side expression of the rule statement is
 evaluated at the current instant in time and the resulting sample vector is
 stored as a new set of time series with the current timestamp and a new metric
 name (and perhaps an overridden set of labels).
-
-## Alerting rules
-Alerting rules allow you to define alert conditions based on Prometheus
-expression language expressions and to send notifications about firing alerts
-to an external service. Whenever the alert expression results in one or more
-vector elements at a given point in time, the alert counts as active for these
-elements' label sets.
-
-### Defining alerting rules
-Alerting rules are defined in the following syntax:
-
-    ALERT <alert name>
-      IF <expression>
-      [FOR <duration>]
-      WITH <label set>
-      SUMMARY "<summary template>"
-      DESCRIPTION "<description template>"
-
-The optional `FOR` clause causes Prometheus to wait for a certain duration
-between first encountering a new expression output vector element (like an
-instance with a high HTTP error rate) and counting an alert as firing for this
-element. Elements that are active, but not firing yet, are in pending state.
-
-The `WITH` clause allows specifying a set of additional labels to be attached
-to the alert. Any existing conflicting labels will be overwritten.
-
-The `SUMMARY` should be a short, human-readable summary of the alert (suitable
-for e.g. an email subject line), while the `DESCRIPTION` clause should provide
-a longer description. Both string fields allow the inclusion of template
-variables derived from the firing vector elements of the alert:
-
-    # To insert a firing element's label values:
-    {{$labels.<labelname>}}
-    # To insert the numeric expression value of the firing element:
-    {{$value}}
-
-Examples:
-
-    # Alert for any instance that is unreachable for >5 minutes.
-    ALERT InstanceDown
-      IF up == 0
-      FOR 5m
-      WITH {
-        severity="page"
-      }
-      SUMMARY "Instance {{$labels.instance}} down"
-      DESCRIPTION "{{$labels.instance}} of job {{$labels.job}} has been down for more than 5 minutes."
-
-    # Alert for any instance that have a median request latency >1s.
-    ALERT ApiHighRequestLatency
-      IF api_http_request_latencies_ms{quantile="0.5"} > 1000
-      FOR 1m
-      WITH {}
-      SUMMARY "High request latency on {{$labels.instance}}"
-      DESCRIPTION "{{$labels.instance}} has a median request latency above 1s (current value: {{$value}})"
-
-### Inspecting alerts during runtime
-To manually inspect which alerts are active (pending or firing), navigate to
-the "Alerts" tab of your Prometheus instance. This will show you the exact
-label sets for which each defined alert is currently active.
-
-For pending and firing alerts, Prometheus also stores synthetic time series of
-the form `ALERTS{alertname="<alert name>", alertstate="pending|firing", <additional alert labels>}`.
-The sample value is set to `1` as long as the alert is in the indicated active
-(pending or firing) state, and a single `0` value gets written out when an alert
-transitions from active to inactive state. Once inactive, the time series does
-not get further updates.
-
-### Sending alert notifications
-Prometheus's alerting rules are good at figuring what is broken *right now*,
-but they are not a fully-fledged notification solution. Another layer is needed
-to add summarization, notification rate limiting, silencing and alert
-dependencies on top of the simple alert definitions. In Prometheus's ecosystem,
-the [Alert Manager](https://github.com/prometheus/alertmanager) takes on this
-role. Thus, Prometheus may be configured to periodically send information about
-alert states to an Alert Manager instance, which then takes care of dispatching
-the right notifications. The Alert Manager instance may be configured via the
-`-alertmanager.url` command line flag.