Document config format for the new AM

2a82c8f7 · Fabian Reinartz · cb0080d4 · 2a82c8f7 · 2a82c8f7 · 2a82c8f7
Commit 2a82c8f7 authored Dec 23, 2015 by Fabian Reinartz
Showing with 235 additions and 53 deletions

alertmanager.md content/docs/alerting/alertmanager.md +5 -50

configuration.md content/docs/alerting/configuration.md +210 -0

rules.md content/docs/alerting/rules.md +20 -3

No files found.
--- a/content/docs/alerting/alertmanager.md
+++ b/content/docs/alerting/alertmanager.md
@@ -19,10 +19,10 @@ in more detail.
 Grouping categorizes alerts of similar nature into a single notification. This
 is especially useful during larger outages when many systems fail at once and
-hundreds the thousands of alerts may be firing simultaniously.
+hundreds to thousands of alerts may be firing simultaneously.
 **Example:** Dozens or hundreds of instances of a service are running in your
-cluster when a network partition occurs. Half our your service instances
+cluster when a network partition occurs. Half of your service instances
 can no longer reach the database.
 Alerting rules in Prometheus were configured to send an alert for each service
 instance if it cannot communicate with the database. As a result hundreds of
@@ -39,14 +39,14 @@ file.
 ## Inhibition
-Inhibition is a concept of surpressing notifications for certain alerts if
+Inhibition is a concept of suppressing notifications for certain alerts if
 certain other alerts are already firing.
 **Example:** An alert is firing that informs that an entire cluster is not
 reachable. Alertmanager can be configured to mute all other alerts concerning
 this cluster if that particular alert is firing.
-This prevents hundreds to thousands of alerts firing unrelated to the actual
+This prevents notifications for hundreds or thousands of firing alerts that
-issue.
+are unrelated to the actual issue.
 Inhibitions are configured through the Alertmanager's configuration file.
@@ -59,48 +59,3 @@ matchers of an active silence.
 If they do, no notifications will be send out for that alert.
 Silences are configured in the web interface of the Alertmanager.
-## Sending alerts
-__Prometheus automatically takes care of sending alerts generated by its
-configured [alerting rules](../rules). The following is a general documentation for clients.__
-The Alertmanager listens for alerts on an API endpoint at `/api/v1/alerts`.
-Clients are expected to continously re-send alerts as long as they are still
-active (usually at the order of 30 seconds to 3 minutes).
-Clients can push a list of alerts to that endpoint via a POST request of
-the following format:
-```
-[
-  {
-    "labels": {
-      "<labelname>": "<labelvalue>",
-      ...
-    },
-    "annotations": {
-      "<labelname>": "<labelvalue>",
-    },
-    "startsAt": "<rfc3339>",
-    "endAt": "<rfc3339>"
-    "generatorURL": "<generator_url>"
-  },
-  ...
-]
-```
-The labels are used to identify identical instances of an alert and to perform
-deduplication. The annotations are always set to those received most recently
-and are not identifying an alert.
-Both timestamps are optional. If `startsAt` is omitted, the current time
-is assigned by the Alertmanager. `endsAt` is only set if the end time of an
-alert is known. Otherwise it will be set to a configurable timeout period from
-the time since the alert was last received.
-The `generatorURL` field is a unique back-link which identifies the causing
-entity of this alert in the client. 
-Alertmanager also supports a legacy endpoint on `/api/alerts` which is
-compatible with Prometheus versions 0.16 and lower.
--- a/content/docs/alerting/configuration.md
+++ b/content/docs/alerting/configuration.md
+---
+title: Configuration
+sort_rank: 3
+nav_icon: sliders
+---
+# Configuration
+[Alertmanager](https://github.com/prometheus/alertmanager) is configured via
+command-line flags and a configuration file.
+While the command-line flags configure immutable system parameters, the
+configuration file defines inhibition rules, notification routing and
+notification receivers.
+To view all available command-line flags, run `alertmanager -h`.
+Alertmanager can reload its configuration at runtime. If the new configuration
+is not well-formed, the changes will not be applied and an error is logged.
+A configuration reload is triggered by sending a `SIGHUP` to the process.
+## Configuration file
+To specify which configuration file to load, use the `-config.file` flag.
+The file is written in the [YAML format](http://en.wikipedia.org/wiki/YAML),
+defined by the scheme described below.
+Brackets indicate that a parameter is optional. For non-list parameters the
+value is set to the specified default.
+Generic placeholders are defined as follows:
+* `<duration>`: a duration matching the regular expression `[0-9]+[smhdwy]`
+* `<labelname>`: a string matching the regular expression `[a-zA-Z_][a-zA-Z0-9_]*`
+* `<labelvalue>`: a string of unicode characters
+* `<filename>`: a valid path in the current working directory
+* `<boolean>`: a boolean that can take the values `true` or `false`
+* `<string>`: a regular string
+* `<tmpl_string>`: a string which is template-expanded before usage
+The other placeholders are specified separately.
+A valid example file can be found [here](https://github.com/prometheus/alertmanager/blob/master/doc/examples/simple.yml).
+The global configuration specifies parameters that are valid in all other
+configuration contexts. They also serve as defaults for other configuration
+sections.
+```
+global:
+  # ResolveTimeout is the time after which an alert is declared resolved
+  # if it has not been updated.
+  [ resolve_timeout: <duration> | default = 5m ]
+  # The default SMTP From header field.
+  [ smtp_from: <tmpl_string> ]
+  # The default SMTP smarthost used for sending emails.
+  [ smtp_smarthost: <tmpl_string> ]
+  # The API URL to use for Slack notifications.
+  [ slack_api_url: <string> ]
+  [ pagerduty_url: <string> | "https://events.pagerduty.com/generic/2010-04-15/create_event.json" ]
+  [ opsgenie_api_host: <string> | "https://api.opsgenie.com/" ]
+# Files from which custom notification template definitions are read.
+# The last component may use a wildcard matcher, e.g. 'templates/*.tmpl'.
+templates:
+  [ - <filepath> ... ]
+# The root node of the routing tree.
+route: <route>
+# A list of inhibition rules.
+inhibit_rules:
+  [ - <inhibit_rule> ... ]
+# A list of notification receivers.
+receivers:
+  - <receiver> ...
+```
+## Route `<route>`
+A route block defines a node in a routing tree and its children. Its optional
+configuration parameters are inherited from its parent node if not set.
+Every alert enters the routing tree at the configured top-level route, which
+must match all alerts (i.e. not have any configured matchers).
+It then traverses the child nodes. If `continue` is set to false, it stops
+after the first matching child. If `continue` is true on a matching node, the
+alert will continue matching against subsequent siblings.
+If an alert does not match any children of a node (no matching child nodes, or
+none exist), the alert is handled based on the configuration paramters of the
+current node.
+```
+[ receiver: <string> ]
+[ group_by: '[' <labelname>, ... ']' ]
+# Zero or more child routes.
+routes:
+  [ - <route> ... ]
+# Whether an alert should continue matching subsequent sibling nodes.
+[ continue: <boolean> | default = true ]
+# A set of equality matchers an alert has to fulfill to match the node.
+match:
+  [ <labelname>: <labelvalue>, ... ]
+# A set of regex-matchers an alert has to fulfill to match the node.
+match_re:
+  [ <labelname>: <regex>, ... ]
+# How long to initially wait to send a notification for a group
+# of alerts. Allows to wait for an inhibiting alert to arrive or collect
+# more initial alerts for the same group. (Usually ~0s to few minutes.)
+[ group_wait: <duration> ]
+# How long to wait before sending notification about new alerts that are
+# in are added to a group of alerts for which an initial notification
+# has already been sent. (Usually ~5min or more.)
+[ group_interval: <duration> ]
+# How long to wait before sending a notification again if it has already
+# been sent successfully for an alert. (Usually ~3h or more).
+[ repeat_interval: <duration> ]
+```
+### Example
+```
+# The root route with all parameters, which are inherited by the child
+# routes if they are not overwritten.
+route:
+  receiver: 'default-receiver'
+  group_wait: 30s
+  group_interval: 5m
+  repeat_interval: 4h
+  group_by: [cluster, alertname]
+  # All alerts that do not match the following child routes
+  # will remain at the root node and be dispatched to 'default-receiver'.
+  routes:
+  # All alerts with service=mysql or service=cassandra
+  # are dispatched to the database pager.
+  - receiver: 'database-pager'
+    group_wait: 10s
+    match_re:
+      service: mysql|cassandra
+  # All alerts with the team=frontend label match this sub-route.
+  # They are grouped by product and environment rather than cluster
+  # and alertname.
+  - receiver: 'frontend-pager'
+    group_by: [product, environment]
+    match:
+      team: frontend
+```
+## Inhibit rule `<inhibit_rule>`
+An inhibition rule is a rule that mutes an alert matching a set of matchers
+under the condition that an alert exists that matches another set of matchers.
+Both alerts must have a set of equal labels.
+```
+# Matchers that have to be fulfilled in the alerts to be muted.
+target_match:
+  [ <labelname>: <labelvalue>, ... ]
+target_match_re:
+  [ <labelname>: <regex>, ... ]
+# Matchers for which one or more alerts have to exist for the
+# inhibition to take effect.
+source_match:
+  [ <labelname>: <labelvalue>, ... ]
+source_match_re:
+  [ <labelname>: <regex>, ... ]
+# Labels that must have an equal value in the source and target
+# alert for the inhibition to take effect.
+[ equal: '[' <labelname>, ... ']' ]
+```
+## Receiver `<receiver>`
+Receiver is a named configuration of one or more notification integrations.
+```
+# The unique name of the receiver.
+name: <string>
+# Configurations for several notification integrations.
+email_configs:
+  [ - <email_config>, ... ]
+pagerduty_configs:
+  [ - <pagerduty_config>, ... ]
+slack_config:
+  [ - <slack_config>, ... ]
+webhook_configs:
+  [ - <webhook_config>, ... ]
+opsgenie_configs:
+  [ - <opsgenie_config>, ... ]
+```
--- a/content/docs/alerting/rules.md
+++ b/content/docs/alerting/rules.md
 ---
 title: Alerting rules
-sort_rank: 3
+sort_rank: 4
 ---
 # Alerting rules
@@ -38,9 +38,26 @@ identifying for an alert instance. They are used to store longer additional
 information such as alert descriptions or runbook links. The annotation values
 can be templated.
+#### v0.16.2 and earlier
+In previous Prometheus versions the rule syntax is as follows:
+    ALERT <alert name>
+      IF <expression>
+      [FOR <duration>]
+      [WITH <label set>]
+      [ANNOTATIONS <label set>]
+Annotations are not free form but fixed to a summary, a description, and a
+runbook field. Labels are attached using the `WITH` rather than the `LABELS`
+clause.
+Label values in the `WITH` clause cannot be templated.
 #### Templating
-Label and annotation values can be templated using Go's template language.
+Label and annotation values can be templated using [console templates](../visualization/consoles).
 The `$labels` variable holds the label key/value pairs of an alert instance
 and `$value` holds the evaluated value of an alert instance.