Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
D
docs
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Administrator
docs
Commits
2a82c8f7
Commit
2a82c8f7
authored
Dec 23, 2015
by
Fabian Reinartz
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Document config format for the new AM
parent
cb0080d4
Changes
3
Show whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
235 additions
and
53 deletions
+235
-53
alertmanager.md
content/docs/alerting/alertmanager.md
+5
-50
configuration.md
content/docs/alerting/configuration.md
+210
-0
rules.md
content/docs/alerting/rules.md
+20
-3
No files found.
content/docs/alerting/alertmanager.md
View file @
2a82c8f7
...
@@ -19,10 +19,10 @@ in more detail.
...
@@ -19,10 +19,10 @@ in more detail.
Grouping categorizes alerts of similar nature into a single notification. This
Grouping categorizes alerts of similar nature into a single notification. This
is especially useful during larger outages when many systems fail at once and
is especially useful during larger outages when many systems fail at once and
hundreds t
he thousands of alerts may be firing simultani
ously.
hundreds t
o thousands of alerts may be firing simultane
ously.
**Example:**
Dozens or hundreds of instances of a service are running in your
**Example:**
Dozens or hundreds of instances of a service are running in your
cluster when a network partition occurs. Half o
ur
your service instances
cluster when a network partition occurs. Half o
f
your service instances
can no longer reach the database.
can no longer reach the database.
Alerting rules in Prometheus were configured to send an alert for each service
Alerting rules in Prometheus were configured to send an alert for each service
instance if it cannot communicate with the database. As a result hundreds of
instance if it cannot communicate with the database. As a result hundreds of
...
@@ -39,14 +39,14 @@ file.
...
@@ -39,14 +39,14 @@ file.
## Inhibition
## Inhibition
Inhibition is a concept of su
r
pressing notifications for certain alerts if
Inhibition is a concept of su
p
pressing notifications for certain alerts if
certain other alerts are already firing.
certain other alerts are already firing.
**Example:**
An alert is firing that informs that an entire cluster is not
**Example:**
An alert is firing that informs that an entire cluster is not
reachable. Alertmanager can be configured to mute all other alerts concerning
reachable. Alertmanager can be configured to mute all other alerts concerning
this cluster if that particular alert is firing.
this cluster if that particular alert is firing.
This prevents
hundreds to thousands of alerts firing unrelated to the actual
This prevents
notifications for hundreds or thousands of firing alerts that
issue.
are unrelated to the actual
issue.
Inhibitions are configured through the Alertmanager's configuration file.
Inhibitions are configured through the Alertmanager's configuration file.
...
@@ -59,48 +59,3 @@ matchers of an active silence.
...
@@ -59,48 +59,3 @@ matchers of an active silence.
If they do, no notifications will be send out for that alert.
If they do, no notifications will be send out for that alert.
Silences are configured in the web interface of the Alertmanager.
Silences are configured in the web interface of the Alertmanager.
## Sending alerts
__
Prometheus automatically takes care of sending alerts generated by its
configured
[
alerting rules
](
../rules
)
. The following is a general documentation for clients.__
The Alertmanager listens for alerts on an API endpoint at
`/api/v1/alerts`
.
Clients are expected to continously re-send alerts as long as they are still
active (usually at the order of 30 seconds to 3 minutes).
Clients can push a list of alerts to that endpoint via a POST request of
the following format:
```
[
{
"labels": {
"<labelname>": "<labelvalue>",
...
},
"annotations": {
"<labelname>": "<labelvalue>",
},
"startsAt": "<rfc3339>",
"endAt": "<rfc3339>"
"generatorURL": "<generator_url>"
},
...
]
```
The labels are used to identify identical instances of an alert and to perform
deduplication. The annotations are always set to those received most recently
and are not identifying an alert.
Both timestamps are optional. If
`startsAt`
is omitted, the current time
is assigned by the Alertmanager.
`endsAt`
is only set if the end time of an
alert is known. Otherwise it will be set to a configurable timeout period from
the time since the alert was last received.
The
`generatorURL`
field is a unique back-link which identifies the causing
entity of this alert in the client.
Alertmanager also supports a legacy endpoint on
`/api/alerts`
which is
compatible with Prometheus versions 0.16 and lower.
content/docs/alerting/configuration.md
0 → 100644
View file @
2a82c8f7
---
title
:
Configuration
sort_rank
:
3
nav_icon
:
sliders
---
# Configuration
[
Alertmanager
](
https://github.com/prometheus/alertmanager
)
is configured via
command-line flags and a configuration file.
While the command-line flags configure immutable system parameters, the
configuration file defines inhibition rules, notification routing and
notification receivers.
To view all available command-line flags, run
`alertmanager -h`
.
Alertmanager can reload its configuration at runtime. If the new configuration
is not well-formed, the changes will not be applied and an error is logged.
A configuration reload is triggered by sending a
`SIGHUP`
to the process.
## Configuration file
To specify which configuration file to load, use the
`-config.file`
flag.
The file is written in the
[
YAML format
](
http://en.wikipedia.org/wiki/YAML
)
,
defined by the scheme described below.
Brackets indicate that a parameter is optional. For non-list parameters the
value is set to the specified default.
Generic placeholders are defined as follows:
*
`<duration>`
: a duration matching the regular expression
`[0-9]+[smhdwy]`
*
`<labelname>`
: a string matching the regular expression
`[a-zA-Z_][a-zA-Z0-9_]*`
*
`<labelvalue>`
: a string of unicode characters
*
`<filename>`
: a valid path in the current working directory
*
`<boolean>`
: a boolean that can take the values
`true`
or
`false`
*
`<string>`
: a regular string
*
`<tmpl_string>`
: a string which is template-expanded before usage
The other placeholders are specified separately.
A valid example file can be found
[
here
](
https://github.com/prometheus/alertmanager/blob/master/doc/examples/simple.yml
)
.
The global configuration specifies parameters that are valid in all other
configuration contexts. They also serve as defaults for other configuration
sections.
```
global:
# ResolveTimeout is the time after which an alert is declared resolved
# if it has not been updated.
[ resolve_timeout: <duration> | default = 5m ]
# The default SMTP From header field.
[ smtp_from: <tmpl_string> ]
# The default SMTP smarthost used for sending emails.
[ smtp_smarthost: <tmpl_string> ]
# The API URL to use for Slack notifications.
[ slack_api_url: <string> ]
[ pagerduty_url: <string> | "https://events.pagerduty.com/generic/2010-04-15/create_event.json" ]
[ opsgenie_api_host: <string> | "https://api.opsgenie.com/" ]
# Files from which custom notification template definitions are read.
# The last component may use a wildcard matcher, e.g. 'templates/*.tmpl'.
templates:
[ - <filepath> ... ]
# The root node of the routing tree.
route: <route>
# A list of inhibition rules.
inhibit_rules:
[ - <inhibit_rule> ... ]
# A list of notification receivers.
receivers:
- <receiver> ...
```
## Route `<route>`
A route block defines a node in a routing tree and its children. Its optional
configuration parameters are inherited from its parent node if not set.
Every alert enters the routing tree at the configured top-level route, which
must match all alerts (i.e. not have any configured matchers).
It then traverses the child nodes. If
`continue`
is set to false, it stops
after the first matching child. If
`continue`
is true on a matching node, the
alert will continue matching against subsequent siblings.
If an alert does not match any children of a node (no matching child nodes, or
none exist), the alert is handled based on the configuration paramters of the
current node.
```
[ receiver: <string> ]
[ group_by: '[' <labelname>, ... ']' ]
# Zero or more child routes.
routes:
[ - <route> ... ]
# Whether an alert should continue matching subsequent sibling nodes.
[ continue: <boolean> | default = true ]
# A set of equality matchers an alert has to fulfill to match the node.
match:
[ <labelname>: <labelvalue>, ... ]
# A set of regex-matchers an alert has to fulfill to match the node.
match_re:
[ <labelname>: <regex>, ... ]
# How long to initially wait to send a notification for a group
# of alerts. Allows to wait for an inhibiting alert to arrive or collect
# more initial alerts for the same group. (Usually ~0s to few minutes.)
[ group_wait: <duration> ]
# How long to wait before sending notification about new alerts that are
# in are added to a group of alerts for which an initial notification
# has already been sent. (Usually ~5min or more.)
[ group_interval: <duration> ]
# How long to wait before sending a notification again if it has already
# been sent successfully for an alert. (Usually ~3h or more).
[ repeat_interval: <duration> ]
```
### Example
```
# The root route with all parameters, which are inherited by the child
# routes if they are not overwritten.
route:
receiver: 'default-receiver'
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
group_by: [cluster, alertname]
# All alerts that do not match the following child routes
# will remain at the root node and be dispatched to 'default-receiver'.
routes:
# All alerts with service=mysql or service=cassandra
# are dispatched to the database pager.
- receiver: 'database-pager'
group_wait: 10s
match_re:
service: mysql|cassandra
# All alerts with the team=frontend label match this sub-route.
# They are grouped by product and environment rather than cluster
# and alertname.
- receiver: 'frontend-pager'
group_by: [product, environment]
match:
team: frontend
```
## Inhibit rule `<inhibit_rule>`
An inhibition rule is a rule that mutes an alert matching a set of matchers
under the condition that an alert exists that matches another set of matchers.
Both alerts must have a set of equal labels.
```
# Matchers that have to be fulfilled in the alerts to be muted.
target_match:
[ <labelname>: <labelvalue>, ... ]
target_match_re:
[ <labelname>: <regex>, ... ]
# Matchers for which one or more alerts have to exist for the
# inhibition to take effect.
source_match:
[ <labelname>: <labelvalue>, ... ]
source_match_re:
[ <labelname>: <regex>, ... ]
# Labels that must have an equal value in the source and target
# alert for the inhibition to take effect.
[ equal: '[' <labelname>, ... ']' ]
```
## Receiver `<receiver>`
Receiver is a named configuration of one or more notification integrations.
```
# The unique name of the receiver.
name: <string>
# Configurations for several notification integrations.
email_configs:
[ - <email_config>, ... ]
pagerduty_configs:
[ - <pagerduty_config>, ... ]
slack_config:
[ - <slack_config>, ... ]
webhook_configs:
[ - <webhook_config>, ... ]
opsgenie_configs:
[ - <opsgenie_config>, ... ]
```
content/docs/alerting/rules.md
View file @
2a82c8f7
---
---
title
:
Alerting rules
title
:
Alerting rules
sort_rank
:
3
sort_rank
:
4
---
---
# Alerting rules
# Alerting rules
...
@@ -38,9 +38,26 @@ identifying for an alert instance. They are used to store longer additional
...
@@ -38,9 +38,26 @@ identifying for an alert instance. They are used to store longer additional
information such as alert descriptions or runbook links. The annotation values
information such as alert descriptions or runbook links. The annotation values
can be templated.
can be templated.
#### v0.16.2 and earlier
In previous Prometheus versions the rule syntax is as follows:
ALERT <alert name>
IF <expression>
[FOR <duration>]
[WITH <label set>]
[ANNOTATIONS <label set>]
Annotations are not free form but fixed to a summary, a description, and a
runbook field. Labels are attached using the
`WITH`
rather than the
`LABELS`
clause.
Label values in the
`WITH`
clause cannot be templated.
#### Templating
#### Templating
Label and annotation values can be templated using
Go's template language
.
Label and annotation values can be templated using
[
console templates
](
../visualization/consoles
)
.
The
`$labels`
variable holds the label key/value pairs of an alert instance
The
`$labels`
variable holds the label key/value pairs of an alert instance
and
`$value`
holds the evaluated value of an alert instance.
and
`$value`
holds the evaluated value of an alert instance.
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment