Commit cb0080d4 authored by Fabian Reinartz's avatar Fabian Reinartz

Document and overview for the new AM

parent a3986a81
...@@ -6,196 +6,101 @@ nav_icon: sliders ...@@ -6,196 +6,101 @@ nav_icon: sliders
# Alertmanager # Alertmanager
The Alertmanager receives alerts from one or more Prometheus servers. The Alertmanager handles alerts sent by client applications such as the
It manages those alerts, including silencing, inhibition, aggregation and Prometheus server. It takes care of deduplicating, grouping, and routing
sending out notifications via methods such as email, PagerDuty and HipChat. them to the correct receiver integration such as email, PagerDuty, or OpsGenie.
It also takes care of silencing and inhibition of alerts.
**WARNING: The Alertmanager is still considered to be very experimental.**
The following describes the core concepts the Alertmanager implements. Consult
## Configuration the [configuration documentation](../configuration) to learn how to use them
in more detail.
The Alertmanager is configured via command-line flags and a configuration file.
## Grouping
The configuration file is an ASCII protocol buffer. To specify which
configuration file to load, use the `-config.file` flag. Grouping categorizes alerts of similar nature into a single notification. This
is especially useful during larger outages when many systems fail at once and
``` hundreds the thousands of alerts may be firing simultaniously.
./alertmanager -config.file alertmanager.conf
``` **Example:** Dozens or hundreds of instances of a service are running in your
cluster when a network partition occurs. Half our your service instances
To send all alerts to email, set the `-notification.smtp.smarthost` flag to can no longer reach the database.
an SMTP smarthost (such as a [Postfix null client](http://www.postfix.org/STANDARD_CONFIGURATION_README.html#null_client)) Alerting rules in Prometheus were configured to send an alert for each service
and use the following configuration: instance if it cannot communicate with the database. As a result hundreds of
alerts are sent to Alertmanager.
```
notification_config { As a user one only wants to get a single page while still being able to see
name: "alertmanager_test" exactly which service instances were affected. Thus one can configure
email_config { Alertmanager to group alerts by their cluster and alertname so it sends a
email: "test@example.org" single compact notification.
}
} Grouping of alerts, timing for the grouped notifications, and the receivers
of those notificiations are configured by a routing tree in the configuration
aggregation_rule { file.
notification_config_name: "alertmanager_test"
} ## Inhibition
```
Inhibition is a concept of surpressing notifications for certain alerts if
### Filtering certain other alerts are already firing.
An aggregation rule can be made to apply to only some alerts using a filter. **Example:** An alert is firing that informs that an entire cluster is not
reachable. Alertmanager can be configured to mute all other alerts concerning
For example, to apply a rule only to alerts with a `severity` label with the value `page`: this cluster if that particular alert is firing.
This prevents hundreds to thousands of alerts firing unrelated to the actual
``` issue.
aggregation_rule {
filter { Inhibitions are configured through the Alertmanager's configuration file.
name_re: "severity"
value_re: "page" ## Silences
}
notification_config_name: "alertmanager_test" Silences are a straightforward way to simply mute alerts for a given time.
} A silence is configured based on matchers, just as the routing tree. Incoming
``` alerts are checked whether they match all the equality or regular expression
matchers of an active silence.
Multiple filters can be provided. If they do, no notifications will be send out for that alert.
### Repeat Rate Silences are configured in the web interface of the Alertmanager.
By default an aggregation rule will repeat notifications every 2 hours. This can be changed using `repeat_rate_seconds`.
``` ## Sending alerts
aggregation_rule {
repeat_rate_seconds: 3600 __Prometheus automatically takes care of sending alerts generated by its
notification_config_name: "alertmanager_test" configured [alerting rules](../rules). The following is a general documentation for clients.__
}
``` The Alertmanager listens for alerts on an API endpoint at `/api/v1/alerts`.
Clients are expected to continously re-send alerts as long as they are still
### Notifications active (usually at the order of 30 seconds to 3 minutes).
Clients can push a list of alerts to that endpoint via a POST request of
The Alertmanager has support for a growing number of notification methods. the following format:
Multiple notifications methods of one or more types can be used in the same
notification config. ```
[
The `send_resolved` field can be used with all notification methods to enable or disable {
sending notifications that an alert has stopped firing. "labels": {
"<labelname>": "<labelvalue>",
#### Email ...
},
The `-notification.smtp.smarthost` flag must be set to an SMTP smarthost. "annotations": {
The `-notification.smtp.sender` flag may be set to change the default From address. "<labelname>": "<labelvalue>",
},
``` "startsAt": "<rfc3339>",
notification_config { "endAt": "<rfc3339>"
name: "alertmanager_email" "generatorURL": "<generator_url>"
email_config { },
email: "test@example.org" ...
} ]
email_config { ```
email: "foo@example.org"
} The labels are used to identify identical instances of an alert and to perform
} deduplication. The annotations are always set to those received most recently
``` and are not identifying an alert.
Plain and CRAM-MD5 SMTP authentication methods are supported. Both timestamps are optional. If `startsAt` is omitted, the current time
The `SMTP_AUTH_USERNAME`, `SMTP_AUTH_SECRET`, `SMTP_AUTH_PASSWORD` and is assigned by the Alertmanager. `endsAt` is only set if the end time of an
`SMTP_AUTH_IDENTITY` environment variables are used to configure them. alert is known. Otherwise it will be set to a configurable timeout period from
the time since the alert was last received.
#### PagerDuty
The `generatorURL` field is a unique back-link which identifies the causing
The Alertmanager integrates as a [Generic API entity of this alert in the client.
Service](https://support.pagerduty.com/hc/en-us/articles/202830340-Creating-a-Generic-API-Service)
with PagerDuty. Alertmanager also supports a legacy endpoint on `/api/alerts` which is
compatible with Prometheus versions 0.16 and lower.
```
notification_config {
name: "alertmanager_pagerduty"
pagerduty_config {
service_key: "supersecretapikey"
}
}
```
#### Pushover
```
notification_config {
name: "alertmanager_pushover"
pushover_config {
token: "mypushovertoken"
user_key: "mypushoverkey"
}
}
```
#### HipChat
```
notification_config {
name: "alertmanager_hipchat"
hipchat_config {
auth_token: "hipchatauthtoken"
room_id: 123456
}
}
```
#### Slack
```
notification_config {
name: "alertmanager_slack"
slack_config {
webhook_url: "webhookurl"
channel: "channelname"
}
}
```
#### Flowdock
```
notification_config {
name: "alertmanager_flowdock"
flowdock_config {
api_token: "4c7234902348234902384234234cdb59"
from_address: "aliaswithgravatar@example.com"
tag: "monitoring"
}
}
```
#### Generic Webhook
The Alertmanager supports sending notifications as JSON to arbitrary
URLs. This could be used to perform automated actions when an
alert fires or integrate with a system that the Alertmanager does not support.
```
notification_config {
name: "alertmanager_webhook"
webhook_config {
url: "http://example.org/my/hook"
}
}
```
An example of JSON message it sends is below.
```json
{
"version": "1",
"status": "firing",
"alert": [
{
"summary": "summary",
"description": "description",
"labels": {
"alertname": "TestAlert"
},
"payload": {
"activeSince": "2015-06-01T12:55:47.356+01:00",
"alertingRule": "ALERT TestAlert IF absent(metric_name) FOR 0y WITH ",
"generatorURL": "http://localhost:9090/graph#%5B%7B%22expr%22%3A%22absent%28metric_name%29%22%2C%22tab%22%3A0%7D%5D",
"value": "1"
}
}
]
}
```
This format is subject to change.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment