Commit cc9e98ac authored by Tobias Schmidt's avatar Tobias Schmidt

Move querying documentation to prometheus/prometheus

parent 8d8e494e
...@@ -16,7 +16,7 @@ real-world systems. ...@@ -16,7 +16,7 @@ real-world systems.
All hope is not lost though. There are many common anomalies which you can All hope is not lost though. There are many common anomalies which you can
detect and handle with custom-built rules. The Prometheus [query detect and handle with custom-built rules. The Prometheus [query
language](../../../../../docs/querying/basics/) gives you the tools to discover language](/docs/prometheus/latest/querying/basics/) gives you the tools to discover
these anomalies while avoiding false positives. these anomalies while avoiding false positives.
<!-- more --> <!-- more -->
...@@ -28,7 +28,7 @@ performing as well as the rest, such as responding with increased latency. ...@@ -28,7 +28,7 @@ performing as well as the rest, such as responding with increased latency.
Let us say that we have a metric `instance:latency_seconds:mean5m` representing the Let us say that we have a metric `instance:latency_seconds:mean5m` representing the
average query latency for each instance of a service, calculated via a average query latency for each instance of a service, calculated via a
[recording rule](/docs/querying/rules/) from a [recording rule](/docs/prometheus/latest/querying/rules/) from a
[Summary](/docs/concepts/metric_types/#summary) metric. [Summary](/docs/concepts/metric_types/#summary) metric.
A simple way to start would be to look for instances with a latency A simple way to start would be to look for instances with a latency
...@@ -116,7 +116,6 @@ route: ...@@ -116,7 +116,6 @@ route:
receiver: restart_webhook receiver: restart_webhook
``` ```
## Summary ## Summary
The Prometheus query language allows for rich processing of your monitoring The Prometheus query language allows for rich processing of your monitoring
......
...@@ -142,7 +142,7 @@ to finish within a reasonable amount of time. This happened to us when we wanted ...@@ -142,7 +142,7 @@ to finish within a reasonable amount of time. This happened to us when we wanted
to graph the top 5 utilized links out of ~18,000 in total. While the query to graph the top 5 utilized links out of ~18,000 in total. While the query
worked, it would take roughly the amount of time we set our timeout limit to, worked, it would take roughly the amount of time we set our timeout limit to,
meaning it was both slow and flaky. We decided to use Prometheus' [recording meaning it was both slow and flaky. We decided to use Prometheus' [recording
rules](/docs/querying/rules/) for precomputing heavy queries. rules](/docs/prometheus/latest/querying/rules/) for precomputing heavy queries.
precomputed_link_utilization_percent = rate(ifHCOutOctets{layer!='access'}[10m])*8/1000/1000 precomputed_link_utilization_percent = rate(ifHCOutOctets{layer!='access'}[10m])*8/1000/1000
/ on (device,interface,alias) / on (device,interface,alias)
......
...@@ -128,7 +128,7 @@ However, if your dashboard query doesn't only touch a single time series but ...@@ -128,7 +128,7 @@ However, if your dashboard query doesn't only touch a single time series but
aggregates over thousands of time series, the number of chunks to access aggregates over thousands of time series, the number of chunks to access
multiplies accordingly, and the overhead of the sequential scan will become multiplies accordingly, and the overhead of the sequential scan will become
dominant. (Such queries are frowned upon, and we usually recommend to use a dominant. (Such queries are frowned upon, and we usually recommend to use a
[recording rule](https://prometheus.io/docs/querying/rules/#recording-rules) [recording rule](https://prometheus.io/docs/prometheus/latest/querying/rules/#recording-rules)
for queries of that kind that are used frequently, e.g. in a dashboard.) But for queries of that kind that are used frequently, e.g. in a dashboard.) But
with the double-delta encoding, the query time might still have been with the double-delta encoding, the query time might still have been
acceptable, let's say around one second. After the switch to varbit encoding, acceptable, let's say around one second. After the switch to varbit encoding,
...@@ -147,7 +147,7 @@ encoding. Start your Prometheus server with ...@@ -147,7 +147,7 @@ encoding. Start your Prometheus server with
`-storage.local.chunk-encoding-version=2` and wait for a while until you have `-storage.local.chunk-encoding-version=2` and wait for a while until you have
enough new chunks with varbit encoding to vet the effects. If you see queries enough new chunks with varbit encoding to vet the effects. If you see queries
that are becoming unacceptably slow, check if you can use that are becoming unacceptably slow, check if you can use
[recording rules](https://prometheus.io/docs/querying/rules/#recording-rules) [recording rules](https://prometheus.io/docs/prometheus/latest/querying/rules/#recording-rules)
to speed them up. Most likely, those queries will gain a lot from that even to speed them up. Most likely, those queries will gain a lot from that even
with the old double-delta encoding. with the old double-delta encoding.
......
...@@ -12,7 +12,7 @@ vector elements at a given point in time, the alert counts as active for these ...@@ -12,7 +12,7 @@ vector elements at a given point in time, the alert counts as active for these
elements' label sets. elements' label sets.
Alerting rules are configured in Prometheus in the same way as [recording Alerting rules are configured in Prometheus in the same way as [recording
rules](../../querying/rules). rules](/docs/prometheus/latest/querying/rules).
### Defining alerting rules ### Defining alerting rules
...@@ -42,7 +42,7 @@ can be templated. ...@@ -42,7 +42,7 @@ can be templated.
#### Templating #### Templating
Label and annotation values can be templated using [console templates](../../visualization/consoles). Label and annotation values can be templated using [console templates](/docs/visualization/consoles).
The `$labels` variable holds the label key/value pairs of an alert instance The `$labels` variable holds the label key/value pairs of an alert instance
and `$value` holds the evaluated value of an alert instance. and `$value` holds the evaluated value of an alert instance.
...@@ -91,7 +91,7 @@ Prometheus's alerting rules are good at figuring what is broken *right now*, ...@@ -91,7 +91,7 @@ Prometheus's alerting rules are good at figuring what is broken *right now*,
but they are not a fully-fledged notification solution. Another layer is needed but they are not a fully-fledged notification solution. Another layer is needed
to add summarization, notification rate limiting, silencing and alert to add summarization, notification rate limiting, silencing and alert
dependencies on top of the simple alert definitions. In Prometheus's ecosystem, dependencies on top of the simple alert definitions. In Prometheus's ecosystem,
the [Alertmanager](../alertmanager) takes on this the [Alertmanager](/docs/alertmanager) takes on this
role. Thus, Prometheus may be configured to periodically send information about role. Thus, Prometheus may be configured to periodically send information about
alert states to an Alertmanager instance, which then takes care of dispatching alert states to an Alertmanager instance, which then takes care of dispatching
the right notifications. The Alertmanager instance may be configured via the the right notifications. The Alertmanager instance may be configured via the
......
...@@ -56,7 +56,7 @@ during a scrape: ...@@ -56,7 +56,7 @@ during a scrape:
* the **count** of events that have been observed, exposed as `<basename>_count` (identical to `<basename>_bucket{le="+Inf"}` above) * the **count** of events that have been observed, exposed as `<basename>_count` (identical to `<basename>_bucket{le="+Inf"}` above)
Use the Use the
[`histogram_quantile()` function](/docs/querying/functions/#histogram_quantile) [`histogram_quantile()` function](/docs/prometheus/latest/querying/functions/#histogram_quantile)
to calculate quantiles from histograms or even aggregations of histograms. A to calculate quantiles from histograms or even aggregations of histograms. A
histogram is also suitable to calculate an histogram is also suitable to calculate an
[Apdex score](http://en.wikipedia.org/wiki/Apdex). When operating on buckets, [Apdex score](http://en.wikipedia.org/wiki/Apdex). When operating on buckets,
......
...@@ -70,7 +70,7 @@ also refer to the Prometheus monitoring system as a whole. ...@@ -70,7 +70,7 @@ also refer to the Prometheus monitoring system as a whole.
### PromQL ### PromQL
[PromQL](../../querying/basics/) is the Prometheus Query Language. It allows for [PromQL](/docs/prometheus/latest/querying/basics/) is the Prometheus Query Language. It allows for
a wide range of operations including aggregation, slicing and dicing, prediction and joins. a wide range of operations including aggregation, slicing and dicing, prediction and joins.
### Pushgateway ### Pushgateway
......
...@@ -25,7 +25,7 @@ For more elaborate overviews of Prometheus, see the resources linked from the ...@@ -25,7 +25,7 @@ For more elaborate overviews of Prometheus, see the resources linked from the
Prometheus's main features are: Prometheus's main features are:
* a multi-dimensional [data model](/docs/concepts/data_model/) with time series data identified by metric name and key/value pairs * a multi-dimensional [data model](/docs/concepts/data_model/) with time series data identified by metric name and key/value pairs
* a [flexible query language](/docs/querying/basics/) * a [flexible query language](/docs/prometheus/latest/querying/basics/)
to leverage this dimensionality to leverage this dimensionality
* no reliance on distributed storage; single server nodes are autonomous * no reliance on distributed storage; single server nodes are autonomous
* time series collection happens via a pull model over HTTP * time series collection happens via a pull model over HTTP
...@@ -57,7 +57,9 @@ its ecosystem components: ...@@ -57,7 +57,9 @@ its ecosystem components:
Prometheus scrapes metrics from instrumented jobs, either directly or via an Prometheus scrapes metrics from instrumented jobs, either directly or via an
intermediary push gateway for short-lived jobs. It stores all scraped samples intermediary push gateway for short-lived jobs. It stores all scraped samples
locally and runs rules over this data to either aggregate and record new time series from existing data or generate alerts. [Grafana](https://grafana.com/) or other API consumers can be used to visualize the collected data. locally and runs rules over this data to either aggregate and record new time
series from existing data or generate alerts. [Grafana](https://grafana.com/) or
other API consumers can be used to visualize the collected data.
## When does it fit? ## When does it fit?
......
...@@ -99,7 +99,7 @@ calculate streaming φ-quantiles on the client side and expose them directly, ...@@ -99,7 +99,7 @@ calculate streaming φ-quantiles on the client side and expose them directly,
while histograms expose bucketed observation counts and the calculation of while histograms expose bucketed observation counts and the calculation of
quantiles from the buckets of a histogram happens on the server side using the quantiles from the buckets of a histogram happens on the server side using the
[`histogram_quantile()` [`histogram_quantile()`
function](/docs/querying/functions/#histogram_quantile). function](/docs/prometheus/latest/querying/functions/#histogram_quantile).
The two approaches have a number of different implications: The two approaches have a number of different implications:
...@@ -107,11 +107,11 @@ The two approaches have a number of different implications: ...@@ -107,11 +107,11 @@ The two approaches have a number of different implications:
|---|-----------|--------- |---|-----------|---------
| Required configuration | Pick buckets suitable for the expected range of observed values. | Pick desired φ-quantiles and sliding window. Other φ-quantiles and sliding windows cannot be calculated later. | Required configuration | Pick buckets suitable for the expected range of observed values. | Pick desired φ-quantiles and sliding window. Other φ-quantiles and sliding windows cannot be calculated later.
| Client performance | Observations are very cheap as they only need to increment counters. | Observations are expensive due to the streaming quantile calculation. | Client performance | Observations are very cheap as they only need to increment counters. | Observations are expensive due to the streaming quantile calculation.
| Server performance | The server has to calculate quantiles. You can use [recording rules](/docs/querying/rules/#recording-rules) should the ad-hoc calculation take too long (e.g. in a large dashboard). | Low server-side cost. | Server performance | The server has to calculate quantiles. You can use [recording rules](/docs/prometheus/latest/querying/rules/#recording-rules) should the ad-hoc calculation take too long (e.g. in a large dashboard). | Low server-side cost.
| Number of time series (in addition to the `_sum` and `_count` series) | One time series per configured bucket. | One time series per configured quantile. | Number of time series (in addition to the `_sum` and `_count` series) | One time series per configured bucket. | One time series per configured quantile.
| Quantile error (see below for details) | Error is limited in the dimension of observed values by the width of the relevant bucket. | Error is limited in the dimension of φ by a configurable value. | Quantile error (see below for details) | Error is limited in the dimension of observed values by the width of the relevant bucket. | Error is limited in the dimension of φ by a configurable value.
| Specification of φ-quantile and sliding time-window | Ad-hoc with [Prometheus expressions](/docs/querying/functions/#histogram_quantile). | Preconfigured by the client. | Specification of φ-quantile and sliding time-window | Ad-hoc with [Prometheus expressions](/docs/prometheus/latest/querying/functions/#histogram_quantile). | Preconfigured by the client.
| Aggregation | Ad-hoc with [Prometheus expressions](/docs/querying/functions/#histogram_quantile). | In general [not aggregatable](http://latencytipoftheday.blogspot.de/2014/06/latencytipoftheday-you-cant-average.html). | Aggregation | Ad-hoc with [Prometheus expressions](/docs/prometheus/latest/querying/functions/#histogram_quantile). | In general [not aggregatable](http://latencytipoftheday.blogspot.de/2014/06/latencytipoftheday-you-cant-average.html).
Note the importance of the last item in the table. Let us return to Note the importance of the last item in the table. Let us return to
the SLA of serving 95% of requests within 300ms. This time, you do not the SLA of serving 95% of requests within 300ms. This time, you do not
...@@ -132,7 +132,7 @@ quantiles yields statistically nonsensical values. ...@@ -132,7 +132,7 @@ quantiles yields statistically nonsensical values.
Using histograms, the aggregation is perfectly possible with the Using histograms, the aggregation is perfectly possible with the
[`histogram_quantile()` [`histogram_quantile()`
function](/docs/querying/functions/#histogram_quantile). function](/docs/prometheus/latest/querying/functions/#histogram_quantile).
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le)) // GOOD. histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le)) // GOOD.
......
...@@ -5,9 +5,9 @@ sort_rank: 6 ...@@ -5,9 +5,9 @@ sort_rank: 6
# Recording rules # Recording rules
A consistent naming scheme for [recording rules](/docs/querying/rules/) makes it A consistent naming scheme for [recording rules](/docs/prometheus/latest/querying/rules/)
easier to interpret the meaning of a rule at a glance. It also avoids mistakes by makes it easier to interpret the meaning of a rule at a glance. It also avoids
making incorrect or meaningless calculations stand out. mistakes by making incorrect or meaningless calculations stand out.
This page documents how to correctly do aggregation and suggests a naming This page documents how to correctly do aggregation and suggests a naming
convention. convention.
...@@ -21,7 +21,7 @@ Recording rules should be of the general form `level:metric:operations`. ...@@ -21,7 +21,7 @@ Recording rules should be of the general form `level:metric:operations`.
of operations that were applied to the metric, newest operation first. of operations that were applied to the metric, newest operation first.
Keeping the metric name unchanged makes it easy to know what a metric is and Keeping the metric name unchanged makes it easy to know what a metric is and
easy to find in the codebase. easy to find in the codebase.
To keep the operations clean, `_sum` is omitted if there are other operations, To keep the operations clean, `_sum` is omitted if there are other operations,
as `sum()`. Associative operations can be merged (for example `min_min` is the as `sum()`. Associative operations can be merged (for example `min_min` is the
...@@ -29,7 +29,7 @@ same as `min`). ...@@ -29,7 +29,7 @@ same as `min`).
If there is no obvious operation to use, use `sum`. When taking a ratio by If there is no obvious operation to use, use `sum`. When taking a ratio by
doing division, separate the metrics using `_per_` and call the operation doing division, separate the metrics using `_per_` and call the operation
`ratio`. `ratio`.
When aggregating up ratios, aggregate up the numerator and denominator When aggregating up ratios, aggregate up the numerator and denominator
separately and then divide. Do not take the average of a ratio or average of an separately and then divide. Do not take the average of a ratio or average of an
......
---
title: HTTP API
sort_rank: 7
---
# HTTP API
The current stable HTTP API is reachable under `/api/v1` on a Prometheus
server. Any non-breaking additions will be added under that endpoint.
## Format overview
The API response format is JSON. Every successful API request returns a `2xx`
status code.
Invalid requests that reach the API handlers return a JSON error object
and one of the following HTTP response codes:
- `400 Bad Request` when parameters are missing or incorrect.
- `422 Unprocessable Entity` when an expression can't be executed
([RFC4918](http://tools.ietf.org/html/rfc4918#page-78)).
- `503 Service Unavailable` when queries time out or abort.
Other non-`2xx` codes may be returned for errors occurring before the API
endpoint is reached.
The JSON response envelope format is as follows:
```
{
"status": "success" | "error",
"data": <data>,
// Only set if status is "error". The data field may still hold
// additional data.
"errorType": "<string>",
"error": "<string>"
}
```
Input timestamps may be provided either in
[RFC3339](https://www.ietf.org/rfc/rfc3339.txt) format or as a Unix timestamp
in seconds, with optional decimal places for sub-second precision. Output
timestamps are always represented as Unix timestamps in seconds.
Names of query parameters that may be repeated end with `[]`.
`<series_selector>` placeholders refer to Prometheus [time series
selectors](/docs/querying/basics/#time-series-selectors) like
`http_requests_total` or `http_requests_total{method=~"^GET|POST$"}` and need
to be URL-encoded.
`<duration>` placeholders refer to Prometheus duration strings of the form
`[0-9]+[smhdwy]`. For example, `5m` refers to a duration of 5 minutes.
## Expression queries
Query language expressions may be evaluated at a single instant or over a range
of time. The sections below describe the API endpoints for each type of
expression query.
### Instant queries
The following endpoint evaluates an instant query at a single point in time:
```
GET /api/v1/query
```
URL query parameters:
- `query=<string>`: Prometheus expression query string.
- `time=<rfc3339 | unix_timestamp>`: Evaluation timestamp. Optional.
- `timeout=<duration>`: Evaluation timeout. Optional. Defaults to and
is capped by the value of the `-query.timeout` flag.
The current server time is used if the `time` parameter is omitted.
The `data` section of the query result has the following format:
```
{
"resultType": "matrix" | "vector" | "scalar" | "string",
"result": <value>
}
```
`<value>` refers to the query result data, which has varying formats
depending on the `resultType`. See the [expression query result
formats](#expression-query-result-formats).
The following example evaluates the expression `up` at the time
`2015-07-01T20:10:51.781Z`:
```
$ curl 'http://localhost:9090/api/v1/query?query=up&time=2015-07-01T20:10:51.781Z'
{
"status" : "success",
"data" : {
"resultType" : "vector",
"result" : [
{
"metric" : {
"__name__" : "up",
"job" : "prometheus",
"instance" : "localhost:9090"
},
"value": [ 1435781451.781, "1" ]
},
{
"metric" : {
"__name__" : "up",
"job" : "node",
"instance" : "localhost:9100"
},
"value" : [ 1435781451.781, "0" ]
}
]
}
}
```
### Range queries
The following endpoint evaluates an expression query over a range of time:
```
GET /api/v1/query_range
```
URL query parameters:
- `query=<string>`: Prometheus expression query string.
- `start=<rfc3339 | unix_timestamp>`: Start timestamp.
- `end=<rfc3339 | unix_timestamp>`: End timestamp.
- `step=<duration>`: Query resolution step width.
- `timeout=<duration>`: Evaluation timeout. Optional. Defaults to and
is capped by the value of the `-query.timeout` flag.
The `data` section of the query result has the following format:
```
{
"resultType": "matrix",
"result": <value>
}
```
For the format of the `<value>` placeholder, see the [range-vector result
format](#range-vectors).
The following example evaluates the expression `up` over a 30-second range with
a query resolution of 15 seconds.
```
$ curl 'http://localhost:9090/api/v1/query_range?query=up&start=2015-07-01T20:10:30.781Z&end=2015-07-01T20:11:00.781Z&step=15s'
{
"status" : "success",
"data" : {
"resultType" : "matrix",
"result" : [
{
"metric" : {
"__name__" : "up",
"job" : "prometheus",
"instance" : "localhost:9090"
},
"values" : [
[ 1435781430.781, "1" ],
[ 1435781445.781, "1" ],
[ 1435781460.781, "1" ]
]
},
{
"metric" : {
"__name__" : "up",
"job" : "node",
"instance" : "localhost:9091"
},
"values" : [
[ 1435781430.781, "0" ],
[ 1435781445.781, "0" ],
[ 1435781460.781, "1" ]
]
}
]
}
}
```
## Querying metadata
### Finding series by label matchers
The following endpoint returns the list of time series that match a certain label set.
```
GET /api/v1/series
```
URL query parameters:
- `match[]=<series_selector>`: Repeated series selector argument that selects the
series to return. At least one `match[]` argument must be provided.
- `start=<rfc3339 | unix_timestamp>`: Start timestamp.
- `end=<rfc3339 | unix_timestamp>`: End timestamp.
The `data` section of the query result consists of a list of objects that
contain the label name/value pairs which identify each series.
The following example returns all series that match either of the selectors
`up` or `process_start_time_seconds{job="prometheus"}`:
```
$ curl -g 'http://localhost:9090/api/v1/series?match[]=up&match[]=process_start_time_seconds{job="prometheus"}'
{
"status" : "success",
"data" : [
{
"__name__" : "up",
"job" : "prometheus",
"instance" : "localhost:9090"
},
{
"__name__" : "up",
"job" : "node",
"instance" : "localhost:9091"
},
{
"__name__" : "process_start_time_seconds",
"job" : "prometheus",
"instance" : "localhost:9090"
}
]
}
```
### Querying label values
The following endpoint returns a list of label values for a provided label name:
```
GET /api/v1/label/<label_name>/values
```
The `data` section of the JSON response is a list of string label names.
This example queries for all label values for the `job` label:
```
$ curl http://localhost:9090/api/v1/label/job/values
{
"status" : "success",
"data" : [
"node",
"prometheus"
]
}
```
## Deleting series
The following endpoint deletes matched series entirely from a Prometheus server:
```
DELETE /api/v1/series
```
URL query parameters:
- `match[]=<series_selector>`: Repeated label matcher argument that selects the
series to delete. At least one `match[]` argument must be provided.
The `data` section of the JSON response has the following format:
```
{
"numDeleted": <number of deleted series>
}
```
The following example deletes all series that match either of the selectors
`up` or `process_start_time_seconds{job="prometheus"}`:
```
$ curl -XDELETE -g 'http://localhost:9090/api/v1/series?match[]=up&match[]=process_start_time_seconds{job="prometheus"}'
{
"status" : "success",
"data" : {
"numDeleted" : 3
}
}
```
## Expression query result formats
Expression queries may return the following response values in the `result`
property of the `data` section. `<sample_value>` placeholders are numeric
sample values. JSON does not support special float values such as `NaN`, `Inf`,
and `-Inf`, so sample values are transferred as quoted JSON strings rather than
raw numbers.
### Range vectors
Range vectors are returned as result type `matrix`. The corresponding
`result` property has the following format:
```
[
{
"metric": { "<label_name>": "<label_value>", ... },
"values": [ [ <unix_time>, "<sample_value>" ], ... ]
},
...
]
```
### Instant vectors
Instant vectors are returned as result type `vector`. The corresponding
`result` property has the following format:
```
[
{
"metric": { "<label_name>": "<label_value>", ... },
"value": [ <unix_time>, "<sample_value>" ]
},
...
]
```
### Scalars
Scalar results are returned as result type `scalar`. The corresponding
`result` property has the following format:
```
[ <unix_time>, "<scalar_value>" ]
```
### Strings
String results are returned as result type `string`. The corresponding
`result` property has the following format:
```
[ <unix_time>, "<string_value>" ]
```
## Targets
> This API is experimental as it is intended to be extended with targets
> dropped due to relabelling in the future.
The following endpoint returns an overview of the current state of the
Prometheus target discovery:
```
GET /api/v1/targets
```
Currently only the active targets are part of the response.
```
$ curl http://localhost:9090/api/v1/targets
{
"status": "success", [3/11]
"data": {
"activeTargets": [
{
"discoveredLabels": {
"__address__": "127.0.0.1:9090",
"__metrics_path__": "/metrics",
"__scheme__": "http",
"job": "prometheus"
},
"labels": {
"instance": "127.0.0.1:9090",
"job": "prometheus"
},
"scrapeUrl": "http://127.0.0.1:9090/metrics",
"lastError": "",
"lastScrape": "2017-01-17T15:07:44.723715405+01:00",
"health": "up"
}
]
}
}
```
## Alertmanagers
> This API is experimental as it is intended to be extended with Alertmanagers
> dropped due to relabelling in the future.
The following endpoint returns an overview of the current state of the
Prometheus alertmanager discovery:
```
GET /api/v1/alertmanagers
```
Currently only the active Alertmanagers are part of the response.
```
$ curl http://localhost:9090/api/v1/alertmanagers
{
"status": "success",
"data": {
"activeAlertmanagers": [
{
"url": "http://127.0.0.1:9090/api/v1/alerts"
}
]
}
}
```
---
title: Querying basics
nav_title: Basics
sort_rank: 1
---
# Querying Prometheus
Prometheus provides a functional expression language that lets the user select
and aggregate time series data in real time. The result of an expression can
either be shown as a graph, viewed as tabular data in Prometheus's expression
browser, or consumed by external systems via the [HTTP API](/docs/querying/api/).
## Examples
This document is meant as a reference. For learning, it might be easier to
start with a couple of [examples](/docs/querying/examples/).
## Expression language data types
In Prometheus's expression language, an expression or sub-expression can
evaluate to one of four types:
* **Instant vector** - a set of time series containing a single sample for each time series, all sharing the same timestamp
* **Range vector** - a set of time series containing a range of data points over time for each time series
* **Scalar** - a simple numeric floating point value
* **String** - a simple string value; currently unused
Depending on the use-case (e.g. when graphing vs. displaying the output of an
expression), only some of these types are legal as the result from a
user-specified expression. For example, an expression that returns an instant
vector is the only type that can be directly graphed.
## Literals
### String literals
Strings may be specified as literals in single quotes, double quotes or
backticks.
PromQL follows the same [escaping rules as
Go](https://golang.org/ref/spec#String_literals). In single or double quotes a
backslash begins an escape sequence, which may be followed by `a`, `b`, `f`,
`n`, `r`, `t`, `v` or `\`. Specific characters can be provided using octal
(`\nnn`) or hexadecimal (`\xnn`, `\unnnn` and `\Unnnnnnnn`).
No escaping is processed inside backticks. Unlike Go, Prometheus does not discard newlines inside backticks.
Example:
"this is a string"
'these are unescaped: \n \\ \t'
`these are not unescaped: \n ' " \t`
### Float literals
Scalar float values can be literally written as numbers of the form
`[-](digits)[.(digits)]`.
-2.43
## Time series Selectors
### Instant vector selectors
Instant vector selectors allow the selection of a set of time series and a
single sample value for each at a given timestamp (instant): in the simplest
form, only a metric name is specified. This results in an instant vector
containing elements for all time series that have this metric name.
This example selects all time series that have the `http_requests_total` metric
name:
http_requests_total
It is possible to filter these time series further by appending a set of labels
to match in curly braces (`{}`).
This example selects only those time series with the `http_requests_total`
metric name that also have the `job` label set to `prometheus` and their
`group` label set to `canary`:
http_requests_total{job="prometheus",group="canary"}
It is also possible to negatively match a label value, or to match label values
against regular expressions. The following label matching operators exist:
* `=`: Select labels that are exactly equal to the provided string.
* `!=`: Select labels that are not equal to the provided string.
* `=~`: Select labels that regex-match the provided string (or substring).
* `!~`: Select labels that do not regex-match the provided string (or substring).
For example, this selects all `http_requests_total` time series for `staging`,
`testing`, and `development` environments and HTTP methods other than `GET`.
http_requests_total{environment=~"staging|testing|development",method!="GET"}
Label matchers that match empty label values also select all time series that do
not have the specific label set at all. Regex-matches are fully anchored.
Vector selectors must either specify a name or at least one label matcher
that does not match the empty string. The following expression is illegal:
{job=~".*"} # Bad!
In contrast, these expressions are valid as they both have a selector that does not
match empty label values.
{job=~".+"} # Good!
{job=~".*",method="get"} # Good!
Label matchers can also be applied to metric names by matching against the internal
`__name__` label. For example, the expression `http_requests_total` is equivalent to
`{__name__="http_requests_total"}`. Matchers other than `=` (`!=`, `=~`, `!~`) may also be used.
The following expression selects all metrics that have a name starting with `job:`:
{__name__=~"^job:.*"}
### Range Vector Selectors
Range vector literals work like instant vector literals, except that they
select a range of samples back from the current instant. Syntactically, a range
duration is appended in square brackets (`[]`) at the end of a vector selector
to specify how far back in time values should be fetched for each resulting
range vector element.
Time durations are specified as a number, followed immediately by one of the
following units:
* `s` - seconds
* `m` - minutes
* `h` - hours
* `d` - days
* `w` - weeks
* `y` - years
In this example, we select all the values we have recorded within the last 5
minutes for all time series that have the metric name `http_requests_total` and
a `job` label set to `prometheus`:
http_requests_total{job="prometheus"}[5m]
### Offset modifier
The `offset` modifier allows changing the time offset for individual
instant and range vectors in a query.
For example, the following expression returns the value of
`http_requests_total` 5 minutes in the past relative to the current
query evaluation time:
http_requests_total offset 5m
Note that the `offset` modifier always needs to follow the selector
immediately, i.e. the following would be correct:
sum(http_requests_total{method="GET"} offset 5m) // GOOD.
While the following would be *incorrect*:
sum(http_requests_total{method="GET"}) offset 5m // INVALID.
The same works for range vectors. This returns the 5-minutes rate that
`http_requests_total` had a week ago:
rate(http_requests_total[5m] offset 1w)
## Operators
Prometheus supports many binary and aggregation operators. These are described
in detail in the [expression language operators](/docs/querying/operators/) page.
## Functions
Prometheus supports several functions to operate on data. These are described
in detail in the [expression language functions](/docs/querying/functions/) page.
## Gotchas
### Interpolation and staleness
When queries are run, timestamps at which to sample data are selected
independently of the actual present time series data. This is mainly to support
cases like aggregation (`sum`, `avg`, and so on), where multiple aggregated
time series do not exactly align in time. Because of their independence,
Prometheus needs to assign a value at those timestamps for each relevant time
series. It does so by simply taking the newest sample before this timestamp.
If no stored sample is found (by default) 5 minutes before a sampling timestamp,
no value is assigned for this time series at this point in time. This
effectively means that time series "disappear" from graphs at times where their
latest collected sample is older than 5 minutes.
NOTE: <b>NOTE:</b> Staleness and interpolation handling might change. See
https://github.com/prometheus/prometheus/issues/398 and
https://github.com/prometheus/prometheus/issues/581.
### Avoiding slow queries and overloads
If a query needs to operate on a very large amount of data, graphing it might
time out or overload the server or browser. Thus, when constructing queries
over unknown data, always start building the query in the tabular view of
Prometheus's expression browser until the result set seems reasonable
(hundreds, not thousands, of time series at most). Only when you have filtered
or aggregated your data sufficiently, switch to graph mode. If the expression
still takes too long to graph ad-hoc, pre-record it via a [recording
rule](/docs/querying/rules/#recording-rules).
This is especially relevant for Prometheus's query language, where a bare
metric name selector like `api_http_requests_total` could expand to thousands
of time series with different labels. Also keep in mind that expressions which
aggregate over many time series will generate load on the server even if the
output is only a small number of time series. This is similar to how it would
be slow to sum all values of a column in a relational database, even if the
output value is only a single number.
---
title: Querying examples
nav_title: Examples
sort_rank: 4
---
# Query examples
## Simple time series selection
Return all time series with the metric `http_requests_total`:
http_requests_total
Return all time series with the metric `http_requests_total` and the given
`job` and `handler` labels:
http_requests_total{job="apiserver", handler="/api/comments"}
Return a whole range of time (in this case 5 minutes) for the same vector,
making it a range vector:
http_requests_total{job="apiserver", handler="/api/comments"}[5m]
Note that an expression resulting in a range vector cannot be graphed directly,
but viewed in the tabular ("Console") view of the expression browser.
Using regular expressions, you could select time series only for jobs whose
name match a certain pattern, in this case, all jobs that end with `server`.
Note that this does a substring match, not a full string match:
http_requests_total{job=~"server$"}
To select all HTTP status codes except 4xx ones, you could run:
http_requests_total{status!~"^4..$"}
## Using functions, operators, etc.
Return the per-second rate for all time series with the `http_requests_total`
metric name, as measured over the last 5 minutes:
rate(http_requests_total[5m])
Assuming that the `http_requests_total` time series all have the labels `job`
(fanout by job name) and `instance` (fanout by instance of the job), we might
want to sum over the rate of all instances, so we get fewer output time series,
but still preserve the `job` dimension:
sum(rate(http_requests_total[5m])) by (job)
If we have two different metrics with the same dimensional labels, we can apply
binary operators to them and elements on both sides with the same label set
will get matched and propagated to the output. For example, this expression
returns the unused memory in MiB for every instance (on a fictional cluster
scheduler exposing these metrics about the instances it runs):
(instance_memory_limit_bytes - instance_memory_usage_bytes) / 1024 / 1024
The same expression, but summed by application, could be written like this:
sum(
instance_memory_limit_bytes - instance_memory_usage_bytes
) by (app, proc) / 1024 / 1024
If the same fictional cluster scheduler exposed CPU usage metrics like the
following for every instance:
instance_cpu_time_ns{app="lion", proc="web", rev="34d0f99", env="prod", job="cluster-manager"}
instance_cpu_time_ns{app="elephant", proc="worker", rev="34d0f99", env="prod", job="cluster-manager"}
instance_cpu_time_ns{app="turtle", proc="api", rev="4d3a513", env="prod", job="cluster-manager"}
instance_cpu_time_ns{app="fox", proc="widget", rev="4d3a513", env="prod", job="cluster-manager"}
...
...we could get the top 3 CPU users grouped by application (`app`) and process
type (`proc`) like this:
topk(3, sum(rate(instance_cpu_time_ns[5m])) by (app, proc))
Assuming this metric contains one time series per running instance, you could
count the number of running instances per application like this:
count(instance_cpu_time_ns) by (app)
---
title: Query functions
nav_title: Functions
sort_rank: 3
---
# Functions
Some functions have default arguments, e.g. `year(v=vector(time())
instant-vector)`. This means that there is one argument `v` which is an instant
vector, which if not provided it will default to the value of the expression
`vector(time())`.
## `abs()`
`abs(v instant-vector)` returns the input vector with all sample values converted to
their absolute value.
## `absent()`
`absent(v instant-vector)` returns an empty vector if the vector passed to it
has any elements and a 1-element vector with the value 1 if the vector passed to
it has no elements.
This is useful for alerting on when no time series exist for a given metric name
and label combination.
```
absent(nonexistent{job="myjob"})
# => {job="myjob"}
absent(nonexistent{job="myjob",instance=~".*"})
# => {job="myjob"}
absent(sum(nonexistent{job="myjob"}))
# => {}
```
In the second example, `absent()` tries to be smart about deriving labels of the
1-element output vector from the input vector.
## `ceil()`
`ceil(v instant-vector)` rounds the sample values of all elements in `v` up to
the nearest integer.
## `changes()`
For each input time series, `changes(v range-vector)` returns the number of
times its value has changed within the provided time range as an instant
vector.
## `clamp_max()`
`clamp_max(v instant-vector, max scalar)` clamps the sample values of all
elements in `v` to have an upper limit of `max`.
## `clamp_min()`
`clamp_min(v instant-vector, min scalar)` clamps the sample values of all
elements in `v` to have a lower limit of `min`.
## `count_scalar()`
`count_scalar(v instant-vector)` returns the number of elements in a time series
vector as a scalar. This is in contrast to the `count()`
[aggregation operator](/docs/querying/operators/#aggregation-operators), which
always returns a vector (an empty one if the input vector is empty) and allows
grouping by labels via a `by` clause.
## `day_of_month()`
`day_of_month(v=vector(time()) instant-vector)` returns the day of the month
for each of the given times in UTC. Returned values are from 1 to 31.
## `day_of_week()`
`day_of_week(v=vector(time()) instant-vector)` returns the day of the week for
each of the given times in UTC. Returned values are from 0 to 6, where 0 means
Sunday etc.
## `days_in_month()`
`days_in_month(v=vector(time()) instant-vector)` returns number of days in the
month for each of the given times in UTC. Returned values are from 28 to 31.
## `delta()`
`delta(v range-vector)` calculates the difference between the
first and last value of each time series element in a range vector `v`,
returning an instant vector with the given deltas and equivalent labels.
The delta is extrapolated to cover the full time range as specified in
the range vector selector, so that it is possible to get a non-integer
result even if the sample values are all integers.
The following example expression returns the difference in CPU temperature
between now and 2 hours ago:
```
delta(cpu_temp_celsius{host="zeus"}[2h])
```
`delta` should only be used with gauges.
## `deriv()`
`deriv(v range-vector)` calculates the per-second derivative of the time series in a range
vector `v`, using [simple linear regression](http://en.wikipedia.org/wiki/Simple_linear_regression).
`deriv` should only be used with gauges.
## `drop_common_labels()`
`drop_common_labels(instant-vector)` drops all labels that have the same name
and value across all series in the input vector.
## `exp()`
`exp(v instant-vector)` calculates the exponential function for all elements in `v`.
Special cases are:
* `Exp(+Inf) = +Inf`
* `Exp(NaN) = NaN`
## `floor()`
`floor(v instant-vector)` rounds the sample values of all elements in `v` down
to the nearest integer.
## `histogram_quantile()`
`histogram_quantile(φ float, b instant-vector)` calculates the
φ-quantile (0 ≤ φ ≤ 1) from the buckets `b` of a
[histogram](/docs/concepts/metric_types/#histogram). (See [histograms
and summaries](/docs/practices/histograms) for a detailed explanation
of φ-quantiles and the usage of the histogram metric type in general.)
The samples in `b` are the counts of observations in each bucket. Each
sample must have a label `le` where the label value denotes the
inclusive upper bound of the bucket. (Samples without such a label are
silently ignored.) The [histogram metric
type](/docs/concepts/metric_types/#histogram) automatically provides
time series with the `_bucket` suffix and the appropriate labels.
Use the `rate()` function to specify the time window for the quantile
calculation.
Example: A histogram metric is called `http_request_duration_seconds`. To
calculate the 90th percentile of request durations over the last 10m, use the
following expression:
histogram_quantile(0.9, rate(http_request_duration_seconds_bucket[10m]))
The quantile is calculated for each label combination in
`http_request_duration_seconds`. To aggregate, use the `sum()` aggregator
around the `rate()` function. Since the `le` label is required by
`histogram_quantile()`, it has to be included in the `by` clause. The following
expression aggregates the 90th percentile by `job`:
histogram_quantile(0.9, sum(rate(http_request_duration_seconds_bucket[10m])) by (job, le))
To aggregate everything, specify only the `le` label:
histogram_quantile(0.9, sum(rate(http_request_duration_seconds_bucket[10m])) by (le))
The `histogram_quantile()` function interpolates quantile values by
assuming a linear distribution within a bucket. The highest bucket
must have an upper bound of `+Inf`. (Otherwise, `NaN` is returned.) If
a quantile is located in the highest bucket, the upper bound of the
second highest bucket is returned. A lower limit of the lowest bucket
is assumed to be 0 if the upper bound of that bucket is greater than
0. In that case, the usual linear interpolation is applied within that
bucket. Otherwise, the upper bound of the lowest bucket is returned
for quantiles located in the lowest bucket.
If `b` contains fewer than two buckets, `NaN` is returned. For φ < 0, `-Inf` is
returned. For φ > 1, `+Inf` is returned.
## `holt_winters()`
`holt_winters(v range-vector, sf scalar, tf scalar)` produces a smoothed value
for time series based on the range in `v`. The lower the smoothing factor `sf`,
the more importance is given to old data. The higher the trend factor `tf`, the
more trends in the data is considered. Both `sf` and `tf` must be between 0 and
1.
`holt_winters` should only be used with gauges.
## `hour()`
`hour(v=vector(time()) instant-vector)` returns the hour of the day
for each of the given times in UTC. Returned values are from 0 to 23.
## `idelta()`
`idelta(v range-vector)`
`idelta(v range-vector)` calculates the difference between the last two samples
in the range vector `v`, returning an instant vector with the given deltas and
equivalent labels.
`idelta` should only be used with gauges.
## `increase()`
`increase(v range-vector)` calculates the increase in the
time series in the range vector. Breaks in monotonicity (such as counter
resets due to target restarts) are automatically adjusted for. The
increase is extrapolated to cover the full time range as specified
in the range vector selector, so that it is possible to get a
non-integer result even if a counter increases only by integer
increments.
The following example expression returns the number of HTTP requests as measured
over the last 5 minutes, per time series in the range vector:
```
increase(http_requests_total{job="api-server"}[5m])
```
`increase` should only be used with counters. It is syntactic sugar
for `rate(v)` multiplied by the number of seconds under the specified
time range window, and should be used primarily for human readability.
Use `rate` in recording rules so that increases are tracked consistently
on a per-second basis.
## `irate()`
`irate(v range-vector)` calculates the per-second instant rate of increase of
the time series in the range vector. This is based on the last two data points.
Breaks in monotonicity (such as counter resets due to target restarts) are
automatically adjusted for.
The following example expression returns the per-second rate of HTTP requests
looking up to 5 minutes back for the two most recent data points, per time
series in the range vector:
```
irate(http_requests_total{job="api-server"}[5m])
```
`irate` should only be used when graphing volatile, fast-moving counters.
Use `rate` for alerts and slow-moving counters, as brief changes
in the rate can reset the `FOR` clause and graphs consisting entirely of rare
spikes are hard to read.
Note that when combining `irate()` with an
[aggregation operator](/docs/querying/operators/#aggregation-operators) (e.g. `sum()`)
or a function aggregating over time (any function ending in `_over_time`),
always take a `irate()` first, then aggregate. Otherwise `irate()` cannot detect
counter resets when your target restarts.
## `label_join()`
For each timeseries in `v`, `label_join(v instant-vector, dst_label string, separator string, src_label_1 string, src_label_2 string, ...)` joins all the values of all the `src_labels`
using `separator` and returns the timeseries with the label `dst_label` containing the joined value.
There can be any number of `src_labels` in this function.
This example will return a vector with each time series having a `foo` label with the value `a,b,c` added to it:
```
label_join(up{job="api-server",src1="a",src2="b",src3="c"}, "foo", ",", "src1", "src2", "src3")
```
## `label_replace()`
For each timeseries in `v`, `label_replace(v instant-vector, dst_label string, replacement string,
src_label string, regex string)` matches the regular expression `regex` against
the label `src_label`. If it matches, then the timeseries is returned with the
label `dst_label` replaced by the expansion of `replacement`. `$1` is replaced
with the first matching subgroup, `$2` with the second etc. If the regular
expression doesn't match then the timeseries is returned unchanged.
This example will return a vector with each time series having a `foo`
label with the value `a` added to it:
```
label_replace(up{job="api-server",service="a:c"}, "foo", "$1", "service", "(.*):.*")
```
## `ln()`
`ln(v instant-vector)` calculates the natural logarithm for all elements in `v`.
Special cases are:
* `ln(+Inf) = +Inf`
* `ln(0) = -Inf`
* `ln(x < 0) = NaN`
* `ln(NaN) = NaN`
## `log2()`
`log2(v instant-vector)` calculates the binary logarithm for all elements in `v`.
The special cases are equivalent to those in `ln`.
## `log10()`
`log10(v instant-vector)` calculates the decimal logarithm for all elements in `v`.
The special cases are equivalent to those in `ln`.
## `minute()`
`minute(v=vector(time()) instant-vector)` returns the minute of the hour for each
of the given times in UTC. Returned values are from 0 to 59.
## `month()`
`month(v=vector(time()) instant-vector)` returns the month of the year for each
of the given times in UTC. Returned values are from 1 to 12, where 1 means
January etc.
## `predict_linear()`
`predict_linear(v range-vector, t scalar)` predicts the value of time series
`t` seconds from now, based on the range vector `v`, using [simple linear
regression](http://en.wikipedia.org/wiki/Simple_linear_regression).
`predict_linear` should only be used with gauges.
## `rate()`
`rate(v range-vector)` calculates the per-second average rate of increase of the
time series in the range vector. Breaks in monotonicity (such as counter
resets due to target restarts) are automatically adjusted for. Also, the
calculation extrapolates to the ends of the time range, allowing for missed
scrapes or imperfect alignment of scrape cycles with the range's time period.
The following example expression returns the per-second rate of HTTP requests as measured
over the last 5 minutes, per time series in the range vector:
```
rate(http_requests_total{job="api-server"}[5m])
```
`rate` should only be used with counters. It is best suited for alerting,
and for graphing of slow-moving counters.
Note that when combining `rate()` with an aggregation operator (e.g. `sum()`)
or a function aggregating over time (any function ending in `_over_time`),
always take a `rate()` first, then aggregate. Otherwise `rate()` cannot detect
counter resets when your target restarts.
## `resets()`
For each input time series, `resets(v range-vector)` returns the number of
counter resets within the provided time range as an instant vector. Any
decrease in the value between two consecutive samples is interpreted as a
counter reset.
`resets` should only be used with counters.
## `round()`
`round(v instant-vector, to_nearest=1 scalar)` rounds the sample values of all
elements in `v` to the nearest integer. Ties are resolved by rounding up. The
optional `to_nearest` argument allows specifying the nearest multiple to which
the sample values should be rounded. This multiple may also be a fraction.
## `scalar()`
Given a single-element input vector, `scalar(v instant-vector)` returns the
sample value of that single element as a scalar. If the input vector does not
have exactly one element, `scalar` will return `NaN`.
## `sort()`
`sort(v instant-vector)` returns vector elements sorted by their sample values,
in ascending order.
## `sort_desc()`
Same as `sort`, but sorts in descending order.
## `sqrt()`
`sqrt(v instant-vector)` calculates the square root of all elements in `v`.
## `time()`
`time()` returns the number of seconds since January 1, 1970 UTC. Note that
this does not actually return the current time, but the time at which the
expression is to be evaluated.
## `vector()`
`vector(s scalar)` returns the scalar `s` as a vector with no labels.
## `year()`
`year(v=vector(time()) instant-vector)` returns the year
for each of the given times in UTC.
## `<aggregation>_over_time()`
The following functions allow aggregating each series of a given range vector
over time and return an instant vector with per-series aggregation results:
* `avg_over_time(range-vector)`: the average value of all points in the specified interval.
* `min_over_time(range-vector)`: the minimum value of all points in the specified interval.
* `max_over_time(range-vector)`: the maximum value of all points in the specified interval.
* `sum_over_time(range-vector)`: the sum of all values in the specified interval.
* `count_over_time(range-vector)`: the count of all values in the specified interval.
* `quantile_over_time(scalar, range-vector)`: the φ-quantile (0 ≤ φ ≤ 1) of the values in the specified interval.
* `stddev_over_time(range-vector)`: the population standard deviation of the values in the specified interval.
* `stdvar_over_time(range-vector)`: the population standard variance of the values in the specified interval.
Note that all values in the specified interval have the same weight in the
aggregation even if the values are not equally spaced throughout the interval.
---
title: Querying
sort_rank: 3
nav_icon: search
---
---
title: Operators
sort_rank: 2
---
# Operators
## Binary operators
Prometheus's query language supports basic logical and arithmetic operators.
For operations between two instant vectors, the [matching behavior](#vector-matching)
can be modified.
### Arithmetic binary operators
The following binary arithmetic operators exist in Prometheus:
* `+` (addition)
* `-` (subtraction)
* `*` (multiplication)
* `/` (division)
* `%` (modulo)
* `^` (power/exponentiation)
Binary arithmetic operators are defined between scalar/scalar, vector/scalar,
and vector/vector value pairs.
**Between two scalars**, the behavior is obvious: they evaluate to another
scalar that is the result of the operator applied to both scalar operands.
**Between an instant vector and a scalar**, the operator is applied to the
value of every data sample in the vector. E.g. if a time series instant vector
is multiplied by 2, the result is another vector in which every sample value of
the original vector is multiplied by 2.
**Between two instant vectors**, a binary arithmetic operator is applied to
each entry in the left-hand-side vector and its [matching element](#vector-matching)
in the right hand vector. The result is propagated into the result vector and the metric
name is dropped. Entries for which no matching entry in the right-hand vector can be
found are not part of the result.
### Comparison binary operators
The following binary comparison operators exist in Prometheus:
* `==` (equal)
* `!=` (not-equal)
* `>` (greater-than)
* `<` (less-than)
* `>=` (greater-or-equal)
* `<=` (less-or-equal)
Comparison operators are defined between scalar/scalar, vector/scalar,
and vector/vector value pairs. By default they filter. Their behaviour can be
modified by providing `bool` after the operator, which will return `0` or `1`
for the value rather than filtering.
**Between two scalars**, the `bool` modifier must be provided and these
operators result in another scalar that is either `0` (`false`) or `1`
(`true`), depending on the comparison result.
**Between an instant vector and a scalar**, these operators are applied to the
value of every data sample in the vector, and vector elements between which the
comparison result is `false` get dropped from the result vector. If the `bool`
modifier is provided, vector elements that would be dropped instead have the value
`0` and vector elements that would be kept have the value `1`.
**Between two instant vectors**, these operators behave as a filter by default,
applied to matching entries. Vector elements for which the expression is not
true or which do not find a match on the other side of the expression get
dropped from the result, while the others are propagated into a result vector
with their original (left-hand-side) metric names and label values.
If the `bool` modifier is provided, vector elements that would have been
dropped instead have the value `0` and vector elements that would be kept have
the value `1` with the left-hand-side metric names and label values.
### Logical/set binary operators
These logical/set binary operators are only defined between instant vectors:
* `and` (intersection)
* `or` (union)
* `unless` (complement)
`vector1 and vector2` results in a vector consisting of the elements of
`vector1` for which there are elements in `vector2` with exactly matching
label sets. Other elements are dropped. The metric name and values are carried
over from the left-hand-side vector.
`vector1 or vector2` results in a vector that contains all original elements
(label sets + values) of `vector1` and additionally all elements of `vector2`
which do not have matching label sets in `vector1`.
`vector1 unless vector2` results in a vector consisting of the elements of
`vector1` for which there are no elements in `vector2` with exactly matching
label sets. All matching elements in both vectors are dropped.
## Vector matching
Operations between vectors attempt to find a matching element in the right-hand-side
vector for each entry in the left-hand side. There are two basic types of
matching behavior:
**One-to-one** finds a unique pair of entries from each side of the operation.
In the default case, that is an operation following the format `vector1 <operator> vector2`.
Two entries match if they have the exact same set of labels and corresponding values.
The `ignoring` keyword allows ignoring certain labels when matching, while the
`on` keyword allows reducing the set of considered labels to a provided list:
<vector expr> <bin-op> ignoring(<label list>) <vector expr>
<vector expr> <bin-op> on(<label list>) <vector expr>
Example input:
method_code:http_errors:rate5m{method="get", code="500"} 24
method_code:http_errors:rate5m{method="get", code="404"} 30
method_code:http_errors:rate5m{method="put", code="501"} 3
method_code:http_errors:rate5m{method="post", code="500"} 6
method_code:http_errors:rate5m{method="post", code="404"} 21
method:http_requests:rate5m{method="get"} 600
method:http_requests:rate5m{method="del"} 34
method:http_requests:rate5m{method="post"} 120
Example query:
method_code:http_errors:rate5m{code="500"} / ignoring(code) method:http_requests:rate5m
This returns a result vector containing the fraction of HTTP requests with status code
of 500 for each method, as measured over the last 5 minutes. Without `ignoring(code)` there
would have been no match as the metrics do not share the same set of labels.
The entries with methods `put` and `del` have no match and will not show up in the result:
{method="get"} 0.04 // 24 / 600
{method="post"} 0.05 // 6 / 120
**Many-to-one** and **one-to-many** matchings refer to the case where each vector element on
the "one"-side can match with multiple elements on the "many"-side. This has to
be explicitly requested using the `group_left` or `group_right` modifier, where
left/right determines which vector has the higher cardinality.
<vector expr> <bin-op> ignoring(<label list>) group_left(<label list>) <vector expr>
<vector expr> <bin-op> ignoring(<label list>) group_right(<label list>) <vector expr>
<vector expr> <bin-op> on(<label list>) group_left(<label list>) <vector expr>
<vector expr> <bin-op> on(<label list>) group_right(<label list>) <vector expr>
The label list provided with the group modifier contains additional labels from
the "one"-side to be included in the result metrics. For `on` a label can only
appear in one of the lists. Every time series of the result vector must be
uniquely identifiable.
_Grouping modifiers can only be used for
[comparison](#comparison-binary-operators) and
[arithmetic](#arithmetic-binary-operators). Operations as `and`, `unless` and
`or` operations match with all possible entries in the right vector by
default._
Example query:
method_code:http_errors:rate5m / ignoring(code) group_left method:http_requests:rate5m
In this case the left vector contains more than one entry per `method` label
value. Thus, we indicate this using `group_left`. The elements from the right
side are now matched with multiple elements with the same `method` label on the
left:
{method="get", code="500"} 0.04 // 24 / 600
{method="get", code="404"} 0.05 // 30 / 600
{method="post", code="500"} 0.05 // 6 / 120
{method="post", code="404"} 0.175 // 21 / 120
_Many-to-one and one-to-many matching are advanced use cases that should be carefully considered.
Often a proper use of `ignoring(<labels>)` provides the desired outcome._
## Aggregation operators
Prometheus supports the following built-in aggregation operators that can be
used to aggregate the elements of a single instant vector, resulting in a new
vector of fewer elements with aggregated values:
* `sum` (calculate sum over dimensions)
* `min` (select minimum over dimensions)
* `max` (select maximum over dimensions)
* `avg` (calculate the average over dimensions)
* `stddev` (calculate population standard deviation over dimensions)
* `stdvar` (calculate population standard variance over dimensions)
* `count` (count number of elements in the vector)
* `count_values` (count number of elements with the same value)
* `bottomk` (smallest k elements by sample value)
* `topk` (largest k elements by sample value)
* `quantile` (calculate φ-quantile (0 ≤ φ ≤ 1) over dimensions)
These operators can either be used to aggregate over **all** label dimensions
or preserve distinct dimensions by including a `without` or `by` clause.
<aggr-op>([parameter,] <vector expression>) [without|by (<label list>)] [keep_common]
`parameter` is only required for `count_values`, `quantile`, `topk` and
`bottomk`. `without` removes the listed labels from the result vector, while
all other labels are preserved the output. `by` does the opposite and drops
labels that are not listed in the `by` clause, even if their label values are
identical between all elements of the vector. The `keep_common` clause allows
keeping those extra labels (labels that are identical between elements, but not
in the `by` clause).
`count_values` outputs one time series per unique sample value. Each series has
an additional label. The name of that label is given by the aggregation
parameter, and the label value is the unique sample value. The value of each
time series is the number of times that sample value was present.
`topk` and `bottomk` are different from other aggregators in that a subset of
the input samples, including the original labels, are returned in the result
vector. `by` and `without` are only used to bucket the input vector.
Example:
If the metric `http_requests_total` had time series that fan out by
`application`, `instance`, and `group` labels, we could calculate the total
number of seen HTTP requests per application and group over all instances via:
sum(http_requests_total) without (instance)
If we are just interested in the total of HTTP requests we have seen in **all**
applications, we could simply write:
sum(http_requests_total)
To count the number of binaries running each build version we could write:
count_values("version", build_version)
To get the 5 largest HTTP requests counts across all instances we could write:
topk(5, http_requests_total)
## Binary operator precedence
The following list shows the precedence of binary operators in Prometheus, from
highest to lowest.
1. `^`
2. `*`, `/`, `%`
3. `+`, `-`
4. `==`, `!=`, `<=`, `<`, `>=`, `>`
5. `and`, `unless`
6. `or`
Operators on the same precedence level are left-associative. For example,
`2 * 3 % 2` is equivalent to `(2 * 3) % 2`. However `^` is right associative,
so `2 ^ 3 ^ 2` is equivalent to `2 ^ (3 ^ 2)`.
---
title: Recording rules
sort_rank: 6
---
# Defining recording rules
## Configuring rules
Prometheus supports two types of rules which may be configured and then
evaluated at regular intervals: recording rules and [alerting
rules](../../alerting/rules). To include rules in Prometheus, create a file
containing the necessary rule statements and have Prometheus load the file via
the `rule_files` field in the [Prometheus configuration](/docs/operating/configuration).
The rule files can be reloaded at runtime by sending `SIGHUP` to the Prometheus
process. The changes are only applied if all rule files are well-formatted.
## Syntax-checking rules
To quickly check whether a rule file is syntactically correct without starting
a Prometheus server, install and run Prometheus's `promtool` command-line
utility tool:
```bash
go get github.com/prometheus/prometheus/cmd/promtool
promtool check-rules /path/to/example.rules
```
When the file is syntactically valid, the checker prints a textual
representation of the parsed rules to standard output and then exits with
a `0` return status.
If there are any syntax errors, it prints an error message to standard error
and exits with a `1` return status. On invalid input arguments the exit status
is `2`.
## Recording rules
Recording rules allow you to precompute frequently needed or computationally
expensive expressions and save their result as a new set of time series.
Querying the precomputed result will then often be much faster than executing
the original expression every time it is needed. This is especially useful for
dashboards, which need to query the same expression repeatedly every time they
refresh.
To add a new recording rule, add a line of the following syntax to your rule
file:
<new time series name>[{<label overrides>}] = <expression to record>
Some examples:
# Saving the per-job HTTP in-progress request count as a new set of time series:
job:http_inprogress_requests:sum = sum(http_inprogress_requests) by (job)
# Drop or rewrite labels in the result time series:
new_time_series{label_to_change="new_value",label_to_drop=""} = old_time_series
Recording rules are evaluated at the interval specified by the
`evaluation_interval` field in the Prometheus configuration. During each
evaluation cycle, the right-hand-side expression of the rule statement is
evaluated at the current instant in time and the resulting sample vector is
stored as a new set of time series with the current timestamp and a new metric
name (and perhaps an overridden set of labels).
...@@ -6,7 +6,7 @@ layout: jumbotron ...@@ -6,7 +6,7 @@ layout: jumbotron
<h1>From metrics to insight</h1> <h1>From metrics to insight</h1>
<p class="subtitle">Power your metrics and alerting with a leading<br>open-source monitoring solution.</p> <p class="subtitle">Power your metrics and alerting with a leading<br>open-source monitoring solution.</p>
<p> <p>
<a class="btn btn-default btn-lg" href="/docs/prometheus/1.8/getting_started/" role="button">Get Started</a> <a class="btn btn-default btn-lg" href="/docs/prometheus/latest/getting_started/" role="button">Get Started</a>
<a class="btn btn-default btn-lg" href="/download" role="button">Download</a> <a class="btn btn-default btn-lg" href="/download" role="button">Download</a>
</p> </p>
</div> </div>
...@@ -25,7 +25,7 @@ layout: jumbotron ...@@ -25,7 +25,7 @@ layout: jumbotron
</a> </a>
</div> </div>
<div class="col-md-3 col-sm-6 col-xs-12 feature-item"> <div class="col-md-3 col-sm-6 col-xs-12 feature-item">
<a href="/docs/querying/basics/"> <a href="/docs/prometheus/latest/querying/basics/">
<h2><i class="fa fa-search"></i> Powerful queries</h2> <h2><i class="fa fa-search"></i> Powerful queries</h2>
<p>A flexible query language allows slicing and dicing of collected time series data in order to generate ad-hoc graphs, tables, and alerts.</p> <p>A flexible query language allows slicing and dicing of collected time series data in order to generate ad-hoc graphs, tables, and alerts.</p>
</a> </a>
...@@ -46,7 +46,7 @@ layout: jumbotron ...@@ -46,7 +46,7 @@ layout: jumbotron
<div class="row"> <div class="row">
<div class="col-md-3 col-sm-6 col-xs-12 feature-item"> <div class="col-md-3 col-sm-6 col-xs-12 feature-item">
<a href="/docs/prometheus/1.8/configuration/"> <a href="/docs/prometheus/latest/configuration/">
<h2><i class="fa fa-cog"></i> Simple operation</h2> <h2><i class="fa fa-cog"></i> Simple operation</h2>
<p>Each server is independent for reliability, relying only on local storage. Written in Go, all binaries are statically linked and easy to deploy.</p> <p>Each server is independent for reliability, relying only on local storage. Written in Go, all binaries are statically linked and easy to deploy.</p>
</a> </a>
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment