Commit 8db8d49b authored by beorn7's avatar beorn7

Merge branch 'master' into next-release

parents b0977c38 f7f2bfe5
......@@ -35,7 +35,7 @@ labels set by an earlier stage:
1. Global labels, which are assigned to every target scraped by the Prometheus instance.
2. The `job` label, which is configured as a default value for each scrape configuration.
3. Labels that are set per target group within a scrape configuration.
4. Advanced label manipulation via [_relabeling_](/docs/operating/configuration/#relabeling-relabel_config).
4. Advanced label manipulation via [_relabeling_](/docs/operating/configuration/#target-relabeling-relabel_config).
Each stage overwrites any colliding labels from the earlier stages. Eventually, we have a flat
set of labels that describe a single target. Those labels are then attached to every time series that
......@@ -76,7 +76,7 @@ scrape_configs:
job: 'job2'
```
Through a mechanism named [_relabeling_](http://prometheus.io/docs/operating/configuration/#relabeling-relabel_config),
Through a mechanism named [_relabeling_](http://prometheus.io/docs/operating/configuration/#target-relabeling-relabel_config),
any label can be removed, created, or modified on a per-target level. This
enables fine-grained labeling that can also take into account metadata coming
from the service discovery. Relabeling is the last stage of label assignment
......@@ -124,7 +124,7 @@ This rule transforms a target with the label set:
You could then also remove the source labels in an additional relabeling step.
You can read more about relabeling and how you can use it to filter targets in the
[configuration documentation](/docs/operating/configuration#relabeling-relabel_config).
[configuration documentation](/docs/operating/configuration#target-relabeling-relabel_config).
Over the next sections, we will see how you can leverage relabeling when using service discovery.
......@@ -219,7 +219,7 @@ has the `production` or `canary` Consul tag, a respective `group` label is assig
Each target's `instance` label is set to the node name provided by Consul.
A full documentation of all configuration parameters for service discovery via Consul
can be found on the [Prometheus website](/docs/operating/configuration##relabeling-relabel_config).
can be found on the [Prometheus website](/docs/operating/configuration#target-relabeling-relabel_config).
## Custom service discovery
......
---
title: Practical Anomaly Detection
created_at: 2015-06-18
kind: article
author_name: Brian Brazil
---
In his *[Open Letter To Monitoring/Metrics/Alerting Companies](http://www.kitchensoap.com/2015/05/01/openlettertomonitoringproducts/)*,
John Allspaw asserts that attempting "to detect anomalies perfectly, at the right time, is not possible".
I have seen several attempts by talented engineers to build systems to
automatically detect and diagnose problems based on time series data. While it
is certainly possible to get a demonstration working, the data always turned
out to be too noisy to make this approach work for anything but the simplest of
real-world systems.
All hope is not lost though. There are many common anomalies which you can
detect and handle with custom-built rules. The Prometheus [query
language](../../../../../docs/querying/basics/) gives you the tools to discover
these anomalies while avoiding false positives.
## Building a query
A common problem within a service is when a small number of servers are not
performing as well as the rest, such as responding with increased latency.
Let us say that we have a metric `instance:latency_seconds:mean5m` representing the
average query latency for each instance of a service, calculated via a
[recording rule](/docs/querying/rules/) from a
[Summary](/docs/concepts/metric_types/#summary) metric.
A simple way to start would be to look for instances with a latency
more than two standard deviations above the mean:
```
instance:latency_seconds:mean5m
> on (job) group_left(instance)
(
avg by (job)(instance:latency_seconds:mean5m)
+ on (job)
2 * stddev by (job)(instance:latency_seconds:mean5m)
)
```
You try this out and discover that there are false positives when
the latencies are very tightly clustered. So you add a requirement
that the instance latency also has to be 20% above the average:
```
(
instance:latency_seconds:mean5m
> on (job) group_left(instance)
(
avg by (job)(instance:latency_seconds:mean5m)
+ on (job)
2 * stddev by (job)(instance:latency_seconds:mean5m)
)
)
> on (job) group_left(instance)
1.2 * avg by (job)(instance:latency_seconds:mean5m)
```
Finally, you find that false positives tend to happen at low traffic levels.
You add a requirement for there to be enough traffic for 1 query per second to
be going to each instance. You create an alert definition for all of this:
```
ALERT InstanceLatencyOutlier
IF
(
instance:latency_seconds:mean5m
> on (job) group_left(instance)
(
avg by (job)(instance:latency_seconds:mean5m)
+ on (job)
2 * stddev by (job)(instance:latency_seconds:mean5m)
)
)
> on (job) group_left(instance)
1.2 * avg by (job)(instance:latency_seconds:mean5m)
and on (job)
avg by (job)(instance:latency_seconds_count:rate5m)
>
1
FOR 30m
SUMMARY "{{$labels.instance}} in {{$labels.job}} is a latency outlier"
DESCRIPTION "{{$labels.instance}} has latency of {{humanizeDuration $value}}"
```
## Automatic actions
The above alert can feed into the
[Alertmanager](/docs/alerting/alertmanager/), and from there to
your chat, ticketing, or paging systems. After a while you might discover that the
usual cause of the alert is something that there is not a proper fix for, but there is an
automated action such as a restart, reboot, or machine replacement that resolves
the issue.
Rather than having humans handle this repetitive task, one option is to
get the Alertmanager to send the alert to a web service that will perform
the action with appropriate throttling and safety features.
The [generic webhook](/docs/alerting/alertmanager/#generic-webhook)
sends alert notifications to an HTTP endpoint of your choice. A simple Alertmanager
configuration that uses it could look like this:
```
# A simple notification configuration which only sends alert notifications to
# an external webhook.
notification_config {
name: "restart_webhook"
webhook_config {
url: "http://example.org/my/hook"
}
}
# An aggregation rule which matches all alerts with the label
# alertname="InstanceLatencyOutlier" and sends them using the "restart_webhook"
# notification configuration.
aggregation_rule {
filter {
name_re: "alertname"
value_re: "InstanceLatencyOutlier"
}
notification_config_name: "restart_webhook"
}
```
## Summary
The Prometheus query language allows for rich processing of your monitoring
data. This lets you to create alerts with good signal-to-noise ratios, and the
Alertmanager's generic webhook support can trigger automatic remediations.
This all combines to enable oncall engineers to focus on problems where they can
have the most impact.
When defining alerts for your services, see also our [alerting best practices](http://prometheus.io/docs/practices/alerting/).
---
title: Monitoring DreamHack - the World's Largest Digital Festival
created_at: 2015-06-24
kind: article
author_name: Christian Svensson (DreamHack Network Team)
---
*Editor's note: This article is a guest post written by a Prometheus user.*
**If you are operating the network for 10,000's of demanding gamers, you need to
really know what is going on inside your network. Oh, and everything needs to be
built from scratch in just five days.**
If you have never heard about [DreamHack](http://www.dreamhack.se/) before, here
is the pitch: Bring 20,000 people together and have the majority of them bring
their own computer. Mix in professional gaming (eSports), programming contests,
and live music concerts. The result is the world's largest festival dedicated
solely to everything digital.
To make such an event possible, there needs to be a lot of infrastructure in
place. Ordinary infrastructures of this size take months to build, but the crew
at DreamHack builds everything from scratch in just five days. This of course
includes stuff like configuring network switches, but also building the
electricity distribution, setting up stores for food and drinks, and even
building the actual tables.
The team that builds and operates everything related to the network is
officially called the Network team, but we usually refer to ourselves as *tech*
or *dhtech*. This post is going to focus on the work of dhtech and how we used
Prometheus during DreamHack Summer 2015 to try to kick our monitoring up another
notch.
## The equipment
Turns out that to build a highly performant network for 10,000+
computers, you need at least the same number of network ports. In our case these
come in the form of ~400 Cisco 2950 switches. We call these the access switches.
These are everywhere in the venue where participants will be seated with their
computers.
[![Access switches](https://c1.staticflickr.com/9/8487/8206439882_4739d39a9c_c.jpg)](https://www.flickr.com/photos/dreamhack/8206439882)
<center>*Dutifully standing in line, the access switches are ready to greet the
DreamHackers with high-speed connectivity.*</center>
Obviously just connecting all these computers to a switch is not enough. That
switch needs to be connected to the other switches as well. This is where the
distribution switches (or dist switches) come into play. These are switches that
take the hundreds of links from all access switches and aggregate them into
more manageable 10-Gbit/s high-capacity fibre. The dist switches are then
further aggregated into our core, where the traffic is routed to its
destination.
On top of all of this, we operate our own WiFi networks, DNS/DHCP servers, and
other infrastructure. When completed, our core looks something like the image
below.
[![The DreamHack network core](https://c2.staticflickr.com/4/3951/18679671439_10ce7a8eb4_c.jpg)](https://www.flickr.com/photos/dreamhack/18679671439)
<center>*The DreamHack network core*</center>
[![Network planning map](http://i.imgur.com/ZCQa2Abl.png)](http://i.imgur.com/ZCQa2Ab.png)
<center>*The planning map for the distribution and core layers. The core is
clearly visible in "Hall D"*</center>
All in all this is becoming a lengthy list of stuff to monitor, so let's get to
the reason you're here: How do we make sure we know what's going on?
## Introducing: dhmon
dhmon is the collective name of the systems that not only
monitor the network, but also allow other teams to collect metrics on whatever
they want.
Since the network needs to be built in five days, it's essential that the
monitoring systems are easy to set up and keep in sync if we need to do last
minute infrastructural changes (like adding or removing devices). When we start
to build the network, we need monitoring as soon as possible to be able to
discover any problems with the equipment or other issues we hadn't foreseen.
In the past we have tried to use a mix of commonly available software such as
Cacti, SNMPc, and Opsview among others. While these have worked they have focused on
being closed systems and only provided the bare minimum. A few years back a few
people from the team said "Enough, we can do better ourselves!" and started
writing a custom monitoring solution.
At the time the options were limited. Over the years the system went from using
Graphite (scalability issues), a custom Cassandra store (high complexity), and
InfluxDB (immature software) to finally land on using Prometheus. I first
learned about Prometheus back in 2014 when I met Julius Volz and I had been
eager to try it ever since. This summer we finally replaced the custom
InfluxDB-based metrics store that we had written with Prometheus. Spoiler: We're
not going back.
## The architecture
The monitoring solution consists of three layers:
collection, storage, presentation. Our most critical collectors are
snmpcollector (SNMP) and ipplan-pinger (ICMP), closely followed by dhcpinfo
(DHCP lease stats). We also have some scripts that dump stats about other
systems into [node_exporter](https://github.com/prometheus/node_exporter)'s
textfile collector.
[![dhmon Architecture](http://i.imgur.com/6gN3MRp.png)](http://i.imgur.com/6gN3MRp.png)
<center>*The current architecture plan of dhmon as of Summer 2015*</center>
We use Prometheus as a central timeseries storage and querying engine, but we
also use Redis and memcached to export snapshot views of binary information that
we collect but cannot store in Prometheus in any sensible way, or when we need
to access very fresh data.
One such case is in our presentation layer. We use our dhmap web application to
get an overview of the overall health of the access switches. In order to be
effective at resolving errors, we need a latency of ~10 seconds from data
collection to presentation. Our goal is to have fixed the problem before the
customer notices, or at least before they have walked over to the support people
to report an issue. For this reason, we have been using memcached since the
beginning to access the latest snapshot of the network.
We continued to use memcached this year for our low-latency data, while using
Prometheus for everything that's historical or not as latency-sensitive. This
decision was made simply because we were unsure how Prometheus would perform at
very short sampling intervals. In the end, we found no reason for why we can't
use Prometheus for this data as well - we will definitely try to replace our
memcached with Prometheus at the next DreamHack.
[![dhmon Visualization](http://i.imgur.com/D5I0Ztbl.png)](http://i.imgur.com/D5I0Ztb.png)
<center>*The overview of our access layer visualized by dhmon*</center>
## Prometheus setup
The block that so far has been referred to as *Prometheus*
really consists of three products:
[Prometheus](https://github.com/prometheus/prometheus),
[PromDash](https://github.com/prometheus/promdash), and
[Alertmanager](https://github.com/prometheus/alertmanager). The setup is fairly
basic and all three components are running on the same host. Everything is
served by an Apache web server that just acts as a reverse proxy.
ProxyPass /prometheus http://localhost:9090/prometheus
ProxyPass /alertmanager http://localhost:9093/alertmanager
ProxyPass /dash http://localhost:3000/dash
## Exploring the network
Prometheus has a powerful querying engine that allows
you to do pretty cool things with the streaming information collected from all
over your network. However, sometimes the queries need to process too much data
to finish within a reasonable amount of time. This happened to us when we wanted
to graph the top 5 utilized links out of ~18,000 in total. While the query
worked, it would take roughly the amount of time we set our timeout limit to,
meaning it was both slow and flaky. We decided to use Prometheus' [recording
rules](/docs/querying/rules/) for precomputing heavy queries.
precomputed_link_utilization_percent = rate(ifHCOutOctets{layer!='access'}[10m])*8/1000/1000
/ on (device,interface,alias)
ifHighSpeed{layer!='access'}
After this, running `topk(5, precomputed_link_utilization_percent)` was
blazingly fast.
## Being reactive: alerting
So at this stage we had something we could query for
the state of the network. Since we are humans, we don't want to spend our time
running queries all the time to see if things are still running as they should,
so obviously we need alerting.
For example: we know that all our access switches use GigabitEthernet0/2 as an
uplink. Sometimes when the network cables have been in storage for too long they
oxidize and are not able to negotiate the full 1000 Mbps that we want.
The negotiated speed of a network port can be found in the SNMP OID
`IF-MIB::ifHighSpeed`. People familiar with SNMP will however recognize that
this OID is indexed by an arbitrary interface index. To make any sense of this
index, we need to cross-reference it with data from SNMP OID `IF-MIB::ifDescr`
to retrieve the actual interface name.
Fortunately, our snmpcollector supports this kind of cross-referencing while
generating Prometheus metrics. This allows us in a simple way to not only query
data, but also define useful alerts. In our setup we configured the SNMP
collection to annotate any metric under the `IF-MIB::ifTable` and
`IF-MIB::ifXTable` OIDs with `ifDescr`. This will come in handy now when we need
to specify that we are only interested in the `GigabitEthernet0/2` port and no
other interface.
Let's have a look at what such an alert definition looks like.
ALERT BadUplinkOnAccessSwitch
IF ifHighSpeed{layer='access', interface='GigabitEthernet0/2'} < 1000 FOR 2m
SUMMARY "Interface linking at {{$value}} Mbps"
DESCRIPTION "Interface {{$labels.interface}} on {{$labels.device}} linking at {{$value}} Mbps"
Done! Now we will get an alert if a switch's uplink suddenly links at a
non-optimal speed.
Let's also look at how an alert for an almost full DHCP scope looks like:
ALERT DhcpScopeAlmostFull
IF ceil((dhcp_leases_current_count / dhcp_leases_max_count)*100) > 90 FOR 2m
SUMMARY "DHCP scope {{$labels.network}} is almost full"
DESCRIPTION "DHCP scope {{$labels.network}} is {{$value}}% full"
We found the syntax to define alerts easy to read and understand even if you had
no previous experience with Prometheus or time series databases.
[![Prometheus alerts for DreamHack](http://i.imgur.com/RV5gM7Ol.png)](http://i.imgur.com/RV5gM7O.png)
<center>*Oops! Turns out we have some bad uplinks, better run out and fix
it!*</center>
## Being proactive: dashboards
While alerting is an essential part of
monitoring, sometimes you just want to have a good overview of the health of
your network. To achieve this we used [PromDash](/docs/visualization/promdash/).
Every time someone asked us something about the network, we crafted a query to
get the answer and saved it as a dashboard widget. The most interesting ones
were then added to an overview dashboard that we proudly displayed.
[![dhmon Dashboard](http://i.imgur.com/yYtC8vLl.png)](http://i.imgur.com/yYtC8vL.png)
<center>*The DreamHack Overview dashboard powered by PromDash*</center>
## The future
While changing an integral part of any system is a complex job and
we're happy that we managed to integrate Prometheus in just one event, there are
without a doubt a lot of areas to improve. Some areas are pretty basic: using
more precomputed metrics to improve performance, adding more alerts, and tuning
the ones we have. Another area is to make it easier for operators: creating an
alert dashboard suitable for our network operations center (NOC), figuring out
if we want to page the people on-call, or just let the NOC escalate alerts.
Some bigger features we're planning on adding: syslog analysis (we have a lot of
syslog!), alerts from our intrusion detection systems, integrating with our
Puppet setup, and also integrating more across the different teams at DreamHack.
We managed to create a proof-of-concept where we got data from one of the
electrical current sensors into our monitoring, making it easy to see if a
device is faulty or if it simply doesn't have any electricity anymore. We're
also working on integrating with the point-of-sale systems that are used in the
stores at the event. Who doesn't want to graph the sales of ice cream?
Finally, not all services that the team operates are on-site, and some even run
24/7 after the event. We want to monitor these services with Prometheus as well,
and in the long run when Prometheus gets support for federation, utilize the
off-site Prometheus to replicate the metrics from the event Prometheus.
## Closing words
We're really excited about Prometheus and how easy it makes
setting up scalable monitoring and alerting from scratch.
A huge shout-out to everyone that helped us in `#prometheus` on
[FreeNode](https://freenode.net/) during the event. Special thanks to Brian
Brazil, Fabian Reinartz and Julius Volz. Thanks for helping us even in the cases
where it was obvious that we hadn't read the documentation thoroughly enough.
Finally, dhmon is all open-source, so head over to https://github.com/dhtech/
and have a look if you're interested. If you feel like you would like to be a
part of this, just head over to `#dreamhack` on
[QuakeNet](https://www.quakenet.org/) and have a chat with us. Who knows, maybe
you will help us build the next DreamHack?
<%= atom_feed :title => 'Prometheus Blog', :author_name => '© Prometheus Authors 2015',
:author_uri => 'http://prometheus.io/blog/', :limit => 10 %>
:author_uri => 'http://prometheus.io/blog/', :limit => 10,
:logo => 'http://prometheus.io/assets/prometheus_logo.png',
:icon => 'http://prometheus.io/assets/favicons/favicon.ico' %>
......@@ -5,11 +5,12 @@ sort_rank: 2
# Metric types
The Prometheus client libraries offer three core metric types:
The Prometheus client libraries offer four core metric types:
* Counters
* Gauges
* Summaries
* Counter
* Gauge
* Histogram
* Summary
These metric types are currently only differentiated in the client libraries
(to enable APIs tailored to the usage of the specific types) and in the wire
......
......@@ -15,14 +15,15 @@ HTTP endpoint on your application’s instance:
* [Go](https://github.com/prometheus/client_golang)
* [Java or Scala](https://github.com/prometheus/client_java)
* [Ruby](https://github.com/prometheus/client_ruby)
* [Python](https://github.com/prometheus/client_python)
* [Ruby](https://github.com/prometheus/client_ruby)
Unofficial third-party client libraries:
* [Bash](https://github.com/aecolley/client_bash)
* [Haskell](https://github.com/fimad/prometheus-haskell)
* [Node.js](https://github.com/StreamMachine/prometheus_client_nodejs)
* [.NET / C#](https://github.com/andrasm/prometheus-net)
* [Bash](https://github.com/aecolley/client_bash)
When Prometheus scrapes your instance's HTTP endpoint, the client library
sends the current state of all tracked metrics to the server.
......
......@@ -16,38 +16,42 @@ These exporters are maintained as part of the official
[Prometheus GitHub organization](https://github.com/prometheus):
* [Node/system metrics exporter](https://github.com/prometheus/node_exporter)
* [Graphite exporter](https://github.com/prometheus/graphite_exporter)
* [AWS CloudWatch exporter](https://github.com/prometheus/cloudwatch_exporter)
* [Collectd exporter](https://github.com/prometheus/collectd_exporter)
* [JMX exporter](https://github.com/prometheus/jmx_exporter)
* [Consul exporter](https://github.com/prometheus/consul_exporter)
* [Graphite exporter](https://github.com/prometheus/graphite_exporter)
* [HAProxy exporter](https://github.com/prometheus/haproxy_exporter)
* [StatsD bridge](https://github.com/prometheus/statsd_bridge)
* [AWS CloudWatch exporter](https://github.com/prometheus/cloudwatch_exporter)
* [Hystrix metrics publisher](https://github.com/prometheus/hystrix)
* [JMX exporter](https://github.com/prometheus/jmx_exporter)
* [Mesos task exporter](https://github.com/prometheus/mesos_exporter)
* [Consul exporter](https://github.com/prometheus/consul_exporter)
* [MySQL server exporter](https://github.com/prometheus/mysqld_exporter)
* [StatsD bridge](https://github.com/prometheus/statsd_bridge)
The [JMX exporter](https://github.com/prometheus/jmx_exporter) can export from a
wide variety of JVM-based applications, for example [Kafka](http://kafka.apache.org/) and
[Cassandra](http://cassandra.apache.org/).
## Unofficial third-party exporters
## Independently maintained third-party exporters
There are also a number of exporters which are externally contributed and
maintained. Note that these may have not been vetted for best practices by the
Prometheus core team yet:
There are also a number of exporters which are externally contributed
and maintained. We encourage the creation of more exporters but cannot
vet all of them for best practices. Commonly, those exporters are
hosted outside of the Prometheus GitHub organization.
* [RethinkDB exporter](https://github.com/oliver006/rethinkdb_exporter)
* [Redis exporter](https://github.com/oliver006/redis_exporter)
* [scollector exporter](https://github.com/tgulacsi/prometheus_scollector)
* [MongoDB exporter](https://github.com/dcu/mongodb_exporter)
* [CouchDB exporter](https://github.com/gesellix/couchdb-exporter)
* [Django exporter](https://github.com/korfuri/django-prometheus)
* [Google's mtail log data extractor](https://github.com/google/mtail)
* [Minecraft exporter module](https://github.com/Baughn/PrometheusIntegration)
* [Meteor JS web framework exporter](https://atmospherejs.com/sevki/prometheus-exporter)
* [Memcached exporter](https://github.com/Snapbug/memcache_exporter)
* [Meteor JS web framework exporter](https://atmospherejs.com/sevki/prometheus-exporter)
* [Minecraft exporter module](https://github.com/Baughn/PrometheusIntegration)
* [MongoDB exporter](https://github.com/dcu/mongodb_exporter)
* [Munin exporter](https://github.com/pvdh/munin_exporter)
* [New Relic exporter](https://github.com/jfindley/newrelic_exporter)
* [RabbitMQ exporter](https://github.com/kbudde/rabbitmq_exporter)
* [Redis exporter](https://github.com/oliver006/redis_exporter)
* [RethinkDB exporter](https://github.com/oliver006/rethinkdb_exporter)
* [Rsyslog exporter](https://github.com/digitalocean/rsyslog_exporter)
* [scollector exporter](https://github.com/tgulacsi/prometheus_scollector)
## Directly instrumentated software
......@@ -55,9 +59,9 @@ Some third-party software already exposes Prometheus metrics natively, so no
separate exporters are needed:
* [cAdvisor](https://github.com/google/cadvisor)
* [Kubernetes](https://github.com/GoogleCloudPlatform/kubernetes)
* [Kubernetes-Mesos](https://github.com/mesosphere/kubernetes-mesos)
* [Etcd](https://github.com/coreos/etcd)
* [gokit](https://github.com/peterbourgon/gokit)
* [go-metrics instrumentation library](https://github.com/armon/go-metrics)
* [gokit](https://github.com/peterbourgon/gokit)
* [Kubernetes-Mesos](https://github.com/mesosphere/kubernetes-mesos)
* [Kubernetes](https://github.com/GoogleCloudPlatform/kubernetes)
* [RobustIRC](http://robustirc.net/)
......@@ -32,12 +32,12 @@ If using Maven, add the following to `pom.xml`:
<dependency>
<groupId>io.prometheus</groupId>
<artifactId>simpleclient</artifactId>
<version>0.0.6</version>
<version>0.0.10</version>
</dependency>
<dependency>
<groupId>io.prometheus</groupId>
<artifactId>simpleclient_pushgateway</artifactId>
<version>0.0.6</version>
<version>0.0.10</version>
</dependency>
```
......@@ -69,7 +69,7 @@ void executeBatchJob() throws Exception {
} finally {
durationTimer.setDuration();
PushGateway pg = new PushGateway("127.0.0.1:9091");
pg.pushAdd(registry, "my_batch_job", "my_batch_job");
pg.pushAdd(registry, "my_batch_job");
}
}
```
......
......@@ -117,11 +117,9 @@ for the current state of this effort.
### Which languages have instrumentation libraries?
Currently, there are client libraries for:
* [Go](https://github.com/prometheus/client_golang)
* [Java or Scala](https://github.com/prometheus/client_java)
* [Ruby](https://github.com/prometheus/client_ruby)
There are a number of client libraries for instrumenting your services with
Prometheus metrics. See the [client libraries](/docs/instrumenting/clientlibs/)
documentation for details.
If you are interested in contributing a client library for a new language, see
the [exposition formats](/docs/instrumenting/exposition_formats/).
......@@ -211,12 +209,13 @@ second. The latter depends on the compressibility of the sample data
and on the number of time series the samples belong to, but to give
you an idea, here are some results from benchmarks:
* On an older 8-core machine with Intel Core i7 CPUs and two spinning
disks (Samsung HD753LJ) in a RAID-1 setup, Prometheus sustained an
ingestion rate of 34k samples per second, belonging to 170k time
series, scraped from 600 targets.
* On a modern server with SSD, Prometheus sustained an ingestion rate
of 340k samples per second, belonging to 2M time
* On an older 8-core machine with Intel Core i7 CPUs, 8GiB RAM, and
two spinning disks (Samsung HD753LJ) in a RAID-1 setup, Prometheus
sustained an ingestion rate of 34k samples per second, belonging to
170k time series, scraped from 600 targets.
* On a modern server with 64GiB RAM and SSD, Prometheus sustained an
ingestion rate of 340k samples per second, belonging to 2M time
series, scraped from 1800 targets.
In both cases, there were no obvious bottlenecks. Various stages of the
......
......@@ -21,28 +21,23 @@ git clone https://github.com/prometheus/prometheus.git
## Building Prometheus
Building Prometheus currently still requires a `make` step, as some parts of
the source are autogenerated (web assets).
[Download the latest release](https://github.com/prometheus/prometheus/releases)
of Prometheus for your platform, then extract and run it:
```language-bash
cd prometheus
make build
```
tar xvfz prometheus-*.tar.gz
./prometheus
```
Note that building requires a fair amount of memory. You should have
at least 2GiB of RAM available.
If you encounter problems building Prometheus, see [the more detailed build
instructions](https://github.com/prometheus/prometheus#use-make) in the
README.md.
It should fail to start, complaining about the absence of a configuration file.
## Configuring Prometheus to monitor itself
Prometheus collects metrics from monitored targets by scraping metrics HTTP
endpoints on these targets. Since Prometheus also exposes data in the same
manner about itself, it may also be used to scrape and monitor its own health.
manner about itself, it can also scrape and monitor its own health.
While a Prometheus server which collects only data about itself is not very
While a Prometheus server that collects only data about itself is not very
useful in practice, it is a good starting example. Save the following basic
Prometheus configuration as a file named `prometheus.yml`:
......@@ -56,7 +51,7 @@ global:
labels:
monitor: 'codelab-monitor'
# A scrape configuration containing exactly one endpoint to scrape:
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
......@@ -86,11 +81,11 @@ Prometheus build directory and run:
```
Prometheus should start up and it should show a status page about itself at
http://localhost:9090. Give it a couple of seconds to start collecting data
about itself from its own HTTP metrics endpoint.
http://localhost:9090. Give it a couple of seconds to collect data about itself
from its own HTTP metrics endpoint.
You can also verify that Prometheus is serving metrics about itself by
navigating to its metrics exposure endpoint: http://localhost:9090/metrics
navigating to its metrics endpoint: http://localhost:9090/metrics
By default, Prometheus will only execute at most one OS thread at a
time. In production scenarios on multi-CPU machines, you will most
......@@ -140,7 +135,7 @@ To count the number of returned time series, you could write:
count(prometheus_target_interval_length_seconds)
```
For further details about the expression language, see the
For more about the expression language, see the
[expression language documentation](/docs/querying/basics/).
## Using the graphing interface
......@@ -193,8 +188,8 @@ endpoints to a single job, adding extra labels to each group of targets. In
this example, we will add the `group="production"` label to the first group of
targets, while adding `group="canary"` to the second.
To achieve this, add the following job definition to your `prometheus.yml` and
restart your Prometheus instance:
To achieve this, add the following job definition to the `scrape_configs`
section in your `prometheus.yml` and restart your Prometheus instance:
```
scrape_configs:
......@@ -262,6 +257,15 @@ rule_files:
- 'prometheus.rules'
scrape_configs:
- job_name: 'prometheus'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
scrape_timeout: 10s
target_groups:
- targets: ['localhost:9090']
- job_name: 'example-random'
scrape_interval: 5s
......
---
title: Media
sort_rank: 7
---
# Media
Resources on the Internet helpful to get started with Prometheus.
## Blogs
* This site has its own [blog](http://prometheus.io/blog/).
* [SoundCloud's blog post announcing Prometheus](https://developers.soundcloud.com/blog/prometheus-monitoring-at-soundcloud) – a more elaborate overview than the one given on this site.
* The [monitoring series](http://www.boxever.com/tag/monitoring) on Boxever's tech blog.
## Recorded talks
* [Prometheus: A Next-Generation Monitoring System](https://www.usenix.org/conference/srecon15europe/program/presentation/rabenstein) – Julius Volz and Björn Rabenstein at SREcon15 Europe, Dublin.
* [What is your application doing right now?](http://youtu.be/Z0LlilNpX1U) – Matthias Gruter, Transmode, at DevOps Stockholm Meetup.
* In German: [Monitoring mit Prometheus](https://entropia.de/GPN15:Monitoring_mit_Prometheus) – Michael Stapelberg at Gulaschprogrammiernacht 15.
* [Prometheus workshop](https://vimeo.com/131581353) - Jamie Wilkinson at Monitorama PDX 2015 ([Slides](https://docs.google.com/presentation/d/1X1rKozAUuF2MVc1YXElFWq9wkcWv3Axdldl8LOH9Vik/edit)).
## Presentation slides
* [Systems Monitoring with Prometheus](http://www.slideshare.net/brianbrazil/devops-ireland-systems-monitoring-with-prometheus) – Brian Brazil at Devops Ireland Meetup, Dublin.
* [Monitoring your Python with Prometheus](http://www.slideshare.net/brianbrazil/python-ireland-monitoring-your-python-with-prometheus) – Brian Brazil at Python Ireland Meetup, Dublin.
......@@ -12,8 +12,8 @@ monitoring and alerting toolkit built at [SoundCloud](http://soundcloud.com).
Since its inception in 2012, it has become the standard for instrumenting new
services at SoundCloud and is seeing growing external usage and contributions.
For a more elaborate overview, see also [SoundCloud's blog post which announces
Prometheus](https://developers.soundcloud.com/blog/prometheus-monitoring-at-soundcloud).
For a more elaborate overview, see the resources linked from the
[media](/docs/introduction/media/) section.
### Features
......
......@@ -110,6 +110,10 @@ dns_sd_configs:
consul_sd_configs:
[ - <consul_sd_config> ... ]
# List of Zookeeper Serverset service discovery configurations.
serverset_sd_configs:
[ - <serverset_sd_config> ... ]
# List of file service discovery configurations.
file_sd_configs:
[ - <file_sd_config> ... ]
......@@ -118,9 +122,13 @@ file_sd_configs:
target_groups:
[ - <target_group> ... ]
# List of relabel configurations.
# List of target relabel configurations.
relabel_configs:
[ - <relabel_config> ... ]
# List of metric relabel configurations.
metric_relabel_configs:
[ - <relabel_config> ... ]
```
Where `<scheme>` may be `http` or `https` and `<path>` is a valid URL path.
......@@ -153,9 +161,9 @@ A DNS-SD configuration allows specifying a set of DNS SRV record names which
are periodically queried to discover a list of targets (host-port pairs). The
DNS servers to be contacted are read from `/etc/resolv.conf`.
During the [relabeling phase](#relabeling-relabel_config), the meta label `__meta_dns_srv_name` is
available on each target and is set to the SRV record name that produced the
discovered target.
During the [relabeling phase](#target-relabeling-relabel_config), the meta
label `__meta_dns_srv_name` is available on each target and is set to the SRV
record name that produced the discovered target.
```
# A list of DNS SRV record names to be queried.
......@@ -198,6 +206,34 @@ services:
[ tag_separator: <string> | default = , ]
```
### Zookeeper Serverset SD configurations `<serverset_sd_config>`
Serverset SD configurations allow retrieving scrape targets from [Serversets]
(https://github.com/twitter/finagle/tree/master/finagle-serversets) which are
stored in [Zookeeper](https://zookeeper.apache.org/). Serversets are commonly
used by [Finagle](https://twitter.github.io/finagle/) and
[Aurora](http://aurora.apache.org/).
The following meta labels are available on targets during relabeling:
* `__meta_serverset_path`: the full path to the serverset member node in Zookeeper
* `__meta_serverset_endpoint_host`: the host of the default endpoint
* `__meta_serverset_endpoint_port`: the port of the default endpoint
* `__meta_serverset_endpoint_host_<endpoint>`: the host of the given endpoint
* `__meta_serverset_endpoint_port_<endpoint>`: the port of the given endpoint
* `__meta_serverset_status`: the status of the member
```
# The Zookeeper servers.
servers:
- <host>
# Paths can point to a single serverset, or the root of a tree of serversets.
paths:
- <string>
[ timeout: <duration> | default = 10s ]
```
Serverset data must be in the JSON format, the Thrift format is not currently supported.
### File-based SD configurations `<file_sd_config>`
......@@ -223,8 +259,9 @@ The JSON version of a target group has the following format:
As a fallback, the file contents are also re-read periodically at the specified
refresh interval.
Each target has a meta label `__meta_filepath` during the [relabeling phase](#relabeling-relabel_config).
Its value is set to the filepath from which the target was extracted.
Each target has a meta label `__meta_filepath` during the
[relabeling phase](#target-relabeling-relabel_config). Its value is set to the
filepath from which the target was extracted.
```
# Patterns for files from which target groups are extracted.
......@@ -239,7 +276,7 @@ Where `<filename_pattern>` may be a path ending in `.json`, `.yml` or `.yaml`. T
may contain a single `*` that matches any character sequence, e.g. `my/path/tg_*.json`.
### Relabeling `<relabel_config>`
### Target relabeling `<relabel_config>`
Relabeling is a powerful tool to dynamically rewrite the label set of a target before
it gets scraped. Multiple relabeling steps can be configured per scrape configuration.
......@@ -271,7 +308,10 @@ source_labels: '[' <labelname> [, ...] ']'
[ target_label: <labelname> ]
# Regular expression against which the extracted value is matched.
regex: <regex>
[ regex: <regex> ]
# Modulus to take of the hash of the source label values.
[ modulus: <uint64> ]
# Replacement value against which a regex replace is performed if the
# regular expression matches.
......@@ -281,7 +321,9 @@ regex: <regex>
[ action: <relabel_action> | default = replace ]
```
`<regex>` is any valid [RE2 regular expression](https://github.com/google/re2/wiki/Syntax).
`<regex>` is any valid [RE2 regular
expression](https://github.com/google/re2/wiki/Syntax). It is required for
the `replace`, `keep`, and `drop` actions.
`<relabel_action>` determines the relabeling action to take:
......@@ -290,3 +332,12 @@ regex: <regex>
(`${1}`, `${2}`, ...) in `replacement` substituted by their value.
* `keep`: Drop targets for which `regex` does not match the concatenated `source_labels`.
* `drop`: Drop targets for which `regex` matches the concatenated `source_labels`.
* `hashmod`: Set `target_label` to the `modulus` of a hash of the concatenated `source_labels`.
### Metric relabeling `<metric_relabel_configs>`
Metric relabeling is applied to samples as the last step before ingestion. It
has the same configuration format and actions as target relabeling. Metric
relabeling does not apply to automatically generated timeseries such as `up`.
One use for this is to blacklist time series that are too expensive to ingest.
......@@ -69,6 +69,13 @@ of thumb, keep it somewhere between 50% and 100% of the
is larger checkpoints. The consequences of a value too low are much
more serious.
Out of the metrics that Prometheus exposes about itself, the following are
particularly useful for tuning the flags above:
* `prometheus_local_storage_memory_series`: The current number of series held in memory.
* `prometheus_local_storage_memory_chunks`: The current number of chunks held in memory.
* `prometheus_local_storage_chunks_to_persist`: The number of memory chunks that still need to be persisted to disk.
## Crash recovery
Prometheus saves chunks to disk as soon as possible after they are
......
---
title: HTTP API
sort_rank: 7
---
# HTTP API
The current stable HTTP API is reachable under `/api/v1` on a Prometheus
server. Any non-breaking additions will be added under that endpoint.
## Format overview
The API response format is JSON. Every successful API request returns a `2xx`
status code.
Invalid requests that reach the API handlers return a JSON error object
and the HTTP response code `422 Unprocessable Entity`
([RFC4918](http://tools.ietf.org/html/rfc4918#page-78)). Other non-`2xx` codes
may be returned for errors occurring before the API endpoint is reached.
The JSON response envelope format is as follows:
```
{
"status": "success" | "error",
"data": <data>,
// Only set if status is "error". The data field may still hold
// additional data.
"errorType": "<string>",
"error": "<string>"
}
```
Input timestamps may be provided either in
[RFC3339](https://www.ietf.org/rfc/rfc3339.txt) format or as a Unix timestamp
in seconds, with optional decimal places for sub-second precision. Output
timestamps are always represented as Unix timestamps in seconds.
Names of query parameters that may be repeated end with `[]`.
`<series_selector>` placeholders refer to Prometheus [time series
selectors](/docs/querying/basics/#time-series-selectors) like
`http_requests_total` or `http_requests_total{method=~"^GET|POST$"}` and need
to be URL-encoded.
`<duration>` placeholders refer to Prometheus duration strings of the form
`[0-9]+[smhdwy]`. For example, `5m` refers to a duration of 5 minutes.
## Expression queries
Query language expressions may be evaluated at a single instant or over a range
of time. The sections below describe the API endpoints for each type of
expression query.
### Instant queries
The following endpoint evaluates an instant query at a single point in time:
```
GET /api/v1/query
```
URL query parameters:
- `query=<string>`: Prometheus expression query string.
- `time=<rfc3339 | unix_timestamp>`: Evaluation timestamp.
The `data` section of the query result has the following format:
```
{
"resultType": "matrix" | "vector" | "scalar" | "string",
"result": <value>
}
```
`<value>` refers to the query result data, which has varying formats
depending on the `resultType`. See the [expression query result
formats](#expression-query-result-formats).
The following example evaluates the expression `up` at the time
`2015-07-01T20:10:51.781Z`:
```
$ curl 'http://localhost:9090/api/v1/query?query=up&time=2015-07-01T20:10:51.781Z'
{
"status" : "success",
"data" : {
"resultType" : "vector",
"result" : [
{
"metric" : {
"__name__" : "up",
"job" : "prometheus",
"instance" : "localhost:9090"
},
"value": [ 1435781451.781, "1" ]
},
{
"metric" : {
"__name__" : "up",
"job" : "node",
"instance" : "localhost:9100"
},
"value" : [ 1435781451.781, "0" ]
}
]
}
}
```
### Range queries
The following endpoint evaluates an expression query over a range of time:
```
GET /api/v1/query_range
```
URL query parameters:
- `query=<string>`: Prometheus expression query string.
- `start=<rfc3339 | unix_timestamp>`: Start timestamp.
- `end=<rfc3339 | unix_timestamp>`: End timestamp.
- `step=<duration>`: Query resolution step width.
The `data` section of the query result has the following format:
```
{
"resultType": "matrix",
"result": <value>
}
```
For the format of the `<value>` placeholder, see the [range-vector result
format](#range-vectors).
The following example evaluates the expression `up` over a 30-second range with
a query resolution of 15 seconds.
```
$ curl 'http://localhost:9090/api/v1/query_range?query=up&start=2015-07-01T20:10:30.781Z&end=2015-07-01T20:11:00.781Z&step=15s'
{
"status" : "success",
"data" : {
"resultType" : "matrix",
"result" : [
{
"metric" : {
"__name__" : "up",
"job" : "prometheus",
"instance" : "localhost:9090"
},
"values" : [
[ 1435781430.781, "1" ],
[ 1435781445.781, "1" ],
[ 1435781460.781, "1" ]
]
},
{
"metric" : {
"__name__" : "up",
"job" : "node",
"instance" : "localhost:9091"
},
"values" : [
[ 1435781430.781, "0" ],
[ 1435781445.781, "0" ],
[ 1435781460.781, "1" ]
]
}
]
}
}
```
## Querying metadata
### Finding series by label matchers
The following endpoint returns the list of time series that match a certain label set.
```
GET /api/v1/series
```
URL query parameters:
- `match[]=<series_selector>`: Repeated series selector argument that selects the
series to return. At least one `match[]` argument must be provided.
The `data` section of the query result consists of a list of objects that
contain the label name/value pairs which identify each series.
The following example returns all series that match either of the selectors
`up` or `process_start_time_seconds{job="prometheus"}`:
```
$ curl -g 'http://localhost:9090/api/v1/series?match[]=up&match[]=process_start_time_seconds{job="prometheus"}'
{
"status" : "success",
"data" : [
{
"__name__" : "up",
"job" : "prometheus",
"instance" : "localhost:9090"
},
{
"__name__" : "up",
"job" : "node",
"instance" : "localhost:9091"
},
{
"__name__" : "process_start_time_seconds",
"job" : "prometheus",
"instance" : "localhost:9090"
}
]
}
```
### Querying label values
The following endpoint returns a list of label values for a provided label name:
```
GET /api/v1/label/<label_name>/values
```
The `data` section of the JSON response is a list of string label names.
This example queries for all label values for the `job` label:
```
$ curl http://localhost:9090/api/v1/label/job/values
{
"status" : "success",
"data" : [
"node",
"prometheus"
]
}
```
## Deleting series
The following endpoint deletes matched series entirely from a Prometheus server:
```
DELETE /api/v1/series
```
URL query parameters:
- `match[]=<series_selector>`: Repeated label matcher argument that selects the
series to delete. At least one `match[]` argument must be provided.
The `data` section of the JSON response has the following format:
```
{
"numDeleted": <number of deleted series>
}
```
The following example deletes all series that match either of the selectors
`up` or `process_start_time_seconds{job="prometheus"}`:
```
$ curl -XDELETE -g 'http://localhost:9090/api/v1/series?match[]=up&match[]=process_start_time_seconds{job="prometheus"}'
{
"status" : "success",
"data" : {
"numDeleted" : 3
}
}
```
## Expression query result formats
Expression queries may return the following response values in the `result`
property of the `data` section. `<sample_value>` placeholders are numeric
sample values. JSON does not support special float values such as `NaN`, `Inf`,
and `-Inf`, so sample values are transferred as quoted JSON strings rather than
raw numbers.
### Range vectors
Range vectors are returned as result type `matrix`. The corresponding
`result` property has the following format:
```
[
{
"metric": { "<label_name>": "<label_value>", ... },
"values": [ [ <unix_time>, "<sample_value>" ], ... ]
},
...
]
```
### Instant vectors
Instant vectors are returned as result type `vector`. The corresponding
`result` property has the following format:
```
[
{
"metric": { "<label_name>": "<label_value>", ... },
"value": [ <unix_time>, "<sample_value>" ]
},
...
]
```
### Scalars
Scalar results are returned as result type `scalar`. The corresponding
`result` property has the following format:
```
[ <unix_time>, "<scalar_value>" ]
```
### Strings
String results are returned as result type `string`. The corresponding
`result` property has the following format:
```
[ <unix_time>, "<string_value>" ]
```
......@@ -9,7 +9,7 @@ sort_rank: 1
Prometheus provides a functional expression language that lets the user select
and aggregate time series data in real time. The result of an expression can
either be shown as a graph, viewed as tabular data in Prometheus's expression
browser, or consumed by external systems via the HTTP API.
browser, or consumed by external systems via the [HTTP API](/docs/querying/api/).
## Examples
......@@ -84,6 +84,28 @@ For example, this selects all `http_requests_total` time series for `staging`,
http_requests_total{environment=~"staging|testing|development",method!="GET"}
Label matchers that match empty label values also select all time series that do
not have the specific label set at all.
Vector selectors must either specify a name or at least one label matcher
that does not match the empty string. The following expression is illegal:
{job=~".*"} # Bad!
In contrast, these expressions are valid as they both have a selector that does not
match empty label values.
{job=~".+"} # Good!
{job=~".*",method="get"} # Good!
Label matchers can also be applied to metric names by matching against the internal
`__name__` label. For example, the expression `http_requests_total` is equivalent to
`{__name__="http_requests_total"}`. Matchers other than `=` (`!=`, `=~`, `!~`) may also be used.
The following expression selects all metrics that have a name starting with `job:`:
{__name__=~"^job:.*"}
### Range Vector Selectors
Range vector literals work like instant vector literals, except that they
......
---
title: Query language
title: Querying
sort_rank: 3
nav_icon: search
---
......@@ -193,3 +193,17 @@ If we are just interested in the total of HTTP requests we have seen in **all**
applications, we could simply write:
sum(http_requests_total)
## Binary operator precedence
The following list shows the precedence of binary operators in Prometheus, from
lowest to highest.
1. `OR`
2. `AND`
3. `==`, `!=`, `<=`, `<`, `>=`, `>`
4. `+`, `-`
5. `*`, `/`, `%`
Operators on the same precedence level are left-associative. For example,
`2 * 3 % 2` is equivalent to `(2 * 3) % 2`.
......@@ -17,14 +17,12 @@ process. The changes are only applied if all rule files are well-formatted.
## Syntax-checking rules
To quickly check whether a rule file is syntactically correct without starting
a Prometheus server, install and run Prometheus's `rule_checker` tool:
a Prometheus server, install and run Prometheus's `promtool` command-line
utility tool:
```bash
# If $GOPATH/github.com/prometheus/prometheus already exists, update it first:
go get -u github.com/prometheus/prometheus
go install github.com/prometheus/prometheus/tools/rule_checker
rule_checker /path/to/example.rules
go get github.com/prometheus/prometheus/cmd/promtool
promtool check-rules /path/to/example.rules
```
When the file is syntactically valid, the checker prints a textual
......@@ -54,7 +52,7 @@ Some examples:
job:http_inprogress_requests:sum = sum(http_inprogress_requests) by (job)
# Drop or rewrite labels in the result time series:
new_time series{label_to_change="new_value",label_to_drop=""} = old_time series
new_time_series{label_to_change="new_value",label_to_drop=""} = old_time_series
Recording rules are evaluated at the interval specified by the
`evaluation_interval` field in the Prometheus configuration. During each
......
......@@ -129,6 +129,9 @@ interpolated version of the given format string. To reference specific label
values in the format string, use double curly braces: `{{label-name}}`. For
example: `{{host}} - cluster {{cluster}}`.
Format strings support filters. See the Filters section below for a list of
currently available filters, expected inputs, and outputs.
### Link to graph
The "Link to this graph" menu tab allows you to generate a link to a specific
graph. This link will show the graph in a single-widget fullscreen view as it
......@@ -189,6 +192,24 @@ In the example of the host dashboard, the URL could look like this:
http://promdash.somedomain.int/hoststats#!?var.host=myhost
Template variables support filters. See the Filters section below for a list of
currently available filters, expected inputs, and outputs.
## Filters
Filters can be used in all places where variable interpolation is supported,
e.g. in legend format strings or template variables. The format is `{{variable
| filter}}` and the following filters are currently available:
- `toPercent`: Input: `0.5`; Output: `50%`
- `toPercentile`: Input: `0.5`; Output: `50th`
- `hostnameFqdn`: Input: `http://your-prometheus-endpoint.net:1111/`; Output: `your-prometheus-endpoint.net:1111`
- `hostname`: Input: `http://your-prometheus-endpoint.net:1111/`; Output: `your-prometheus-endpoint`
- `regex`: If `job` == `prometheus`, `{{job | regex:"pro":"faux"}}` => `fauxmetheus`
Filters are chainable, so `{{label | filter1 | filter2}}` will apply `filter1`
to `label`, and then apply `filter2` to that result.
## Annotations
PromDash allows you to load timestamped annotations from an external service
......
......@@ -54,6 +54,7 @@ If functions are used in a pipeline, the pipeline value is passed as the last ar
| humanize | number | string | Converts a number to a more readable format, using [metric prefixes](http://en.wikipedia.org/wiki/Metric_prefix).
| humanize1024 | number | string | Like `humanize`, but uses 1024 as the base rather than 1000. |
| humanizeDuration | number | string | Converts a duration in seconds to a more readable format. |
| humanizeTimestamp | number | string | Converts a Unix timestamp in seconds to a more readable format. |
Humanizing functions are intended to produce reasonable output for consumption
by humans, and are not guaranteed to return the same results between Prometheus
......
......@@ -38,7 +38,7 @@ layout: jumbotron
<div class="col-md-3">
<h2><i class="fa fa-warning"></i> Alerting</h2>
<p class="desc">Alerts are defined based on Prometheus's flexible query language and maintain dimensional information. An alertmanager handles notifications and silencing.</p>
<p><a class="btn btn-default" href="/docs/querying/rules/#alerting-rules" role="button">View details &raquo;</a></p>
<p><a class="btn btn-default" href="/docs/alerting/rules/" role="button">View details &raquo;</a></p>
</div>
<div class="col-md-3">
<h2><i class="fa fa-cloud-upload"></i> Exporters</h2>
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment