Commit 1e182ed2 authored by Julius Volz's avatar Julius Volz

Merge pull request #43 from prometheus/beorn7/doc-improve

Document histograms and histogram_quantile.
parents eeb0170d c532c7f2
...@@ -24,7 +24,7 @@ tasks completed, errors occurred, etc. Counters should not be used to expose ...@@ -24,7 +24,7 @@ tasks completed, errors occurred, etc. Counters should not be used to expose
current counts of items whose number can also go down, e.g. the number of current counts of items whose number can also go down, e.g. the number of
currently running goroutines. Use gauges for this use case. currently running goroutines. Use gauges for this use case.
See the client library usage documentation for counters: Client library usage documentation for counters:
* [Go](http://godoc.org/github.com/prometheus/client_golang/prometheus#Counter) * [Go](http://godoc.org/github.com/prometheus/client_golang/prometheus#Counter)
* [Java](https://github.com/prometheus/client_java/blob/master/client/src/main/java/io/prometheus/client/metrics/Counter.java) * [Java](https://github.com/prometheus/client_java/blob/master/client/src/main/java/io/prometheus/client/metrics/Counter.java)
...@@ -41,7 +41,7 @@ Gauges are typically used for measured values like temperatures or current ...@@ -41,7 +41,7 @@ Gauges are typically used for measured values like temperatures or current
memory usage, but also "counts" that can go up and down, like the number of memory usage, but also "counts" that can go up and down, like the number of
running goroutines. running goroutines.
See the client library usage documentation for gauges: Client library usage documentation for gauges:
* [Go](http://godoc.org/github.com/prometheus/client_golang/prometheus#Gauge) * [Go](http://godoc.org/github.com/prometheus/client_golang/prometheus#Gauge)
* [Java](https://github.com/prometheus/client_java/blob/master/client/src/main/java/io/prometheus/client/metrics/Gauge.java) * [Java](https://github.com/prometheus/client_java/blob/master/client/src/main/java/io/prometheus/client/metrics/Gauge.java)
...@@ -49,26 +49,52 @@ See the client library usage documentation for gauges: ...@@ -49,26 +49,52 @@ See the client library usage documentation for gauges:
* [Ruby](https://github.com/prometheus/client_ruby#gauge) * [Ruby](https://github.com/prometheus/client_ruby#gauge)
* [Python](https://github.com/prometheus/client_python#gauge) * [Python](https://github.com/prometheus/client_python#gauge)
## Summaries ## Histogram
A _summary_ samples observations (usually things like request durations) over A _histogram_ samples observations (usually things like request durations or
sliding windows of time and provides instantaneous insight into their response sizes) and counts them in configurable buckets. It also provides a sum
distributions, frequencies, and sums. of all observed values.
A histogram with a base metric name of `<basename>` exposes multiple time series
during a scrape:
* cumulative counters for the observation buckets, exposed as `<basename>_bucket{le="<upper inclusive bound>"}`
* the **total sum** of all observed values, exposed as `<basename>_sum`
* the **count** of events that have been observed, exposed as `<basename>_count` (identical to `<basename>_bucket{le="+Inf"}` above)
Use the [`histogram_quantile()`
function](/docs/querying/functions/#histogram_quantile()) to calculate
quantiles from histograms or even aggregations of histograms. A
histogram is also suitable to calculate an [Apdex
score](http://en.wikipedia.org/wiki/Apdex). See [histograms and
summaries](/docs/practices/histograms) for details of histogram usage
and differences to [summaries](#summary).
Client library usage documentation for histograms:
* [Go](http://godoc.org/github.com/prometheus/client_golang/prometheus#Histogram)
* [Java](https://github.com/prometheus/client_java/blob/master/simpleclient/src/main/java/io/prometheus/client/Histogram.java) (histograms are only supported by the simple client but not by the legacy client)
* [Python](https://github.com/prometheus/client_python#histogram)
## Summary
Similar to a _histogram_, a _summary_ samples observations (usually things like
request durations and response sizes). While it also provides a total count of
observations and a sum of all observed values, it calculates configurable
quantiles over a sliding time window.
A summary with a base metric name of `<basename>` exposes multiple time series A summary with a base metric name of `<basename>` exposes multiple time series
during a scrape: during a scrape:
* streaming **quantile values** of observed events, exposed as `<basename>{quantile="<quantile label>"}` * streaming **φ-quantiles** (0 ≤ φ ≤ 1) of observed events, exposed as `<basename>{quantile="<φ>"}`
* the **total sum** of all observed values, exposed as `<basename>_sum` * the **total sum** of all observed values, exposed as `<basename>_sum`
* the **count** of events that have been observed, exposed as `<basename>_count` * the **count** of events that have been observed, exposed as `<basename>_count`
This is quite convenient, for if you are interested in tracking latencies of an See [histograms and summaries](/docs/practices/histograms) for
operation in real time, you get three types of information reported for free detailed explanations of φ-quantiles, summary usage, and differences
with one metric. to [histograms](#histogram).
A typical use-case is the observation of request latencies or response sizes.
See the client library usage documentation for summaries: Client library usage documentation for summaries:
* [Go](http://godoc.org/github.com/prometheus/client_golang/prometheus#Summary) * [Go](http://godoc.org/github.com/prometheus/client_golang/prometheus#Summary)
* [Java](https://github.com/prometheus/client_java/blob/master/client/src/main/java/io/prometheus/client/metrics/Summary.java) * [Java](https://github.com/prometheus/client_java/blob/master/client/src/main/java/io/prometheus/client/metrics/Summary.java)
......
...@@ -23,17 +23,6 @@ detailed local views. ...@@ -23,17 +23,6 @@ detailed local views.
GitHub issue: [#9](https://github.com/prometheus/prometheus/issues/9) GitHub issue: [#9](https://github.com/prometheus/prometheus/issues/9)
**Aggregatable histograms**
The current client-side [summary
types](/docs/concepts/metric_types/#summaries) do not
support aggregation of quantiles. For example, it is [statistically
incorrect](http://latencytipoftheday.blogspot.de/2014/06/latencytipoftheday-you-cant-average.html)
to average over the 90th percentile latency of multiple monitored instances.
We plan to implement server-side histograms which will allow for this use case.
GitHub issue: [#480](https://github.com/prometheus/prometheus/issues/480)
**More flexible label matching in binary operations** **More flexible label matching in binary operations**
[Binary operations](/docs/querying/operators/) between time series vectors [Binary operations](/docs/querying/operators/) between time series vectors
......
--- ---
title: Alerting title: Alerting
sort_rank: 4 sort_rank: 5
--- ---
# Alerting # Alerting
......
...@@ -3,7 +3,7 @@ title: Consoles and dashboards ...@@ -3,7 +3,7 @@ title: Consoles and dashboards
sort_rank: 3 sort_rank: 3
--- ---
## Consoles and dashboards # Consoles and dashboards
It can be tempting to display as much data as possible on a dashboard, especially It can be tempting to display as much data as possible on a dashboard, especially
when a system like Prometheus offers the ability to have such rich when a system like Prometheus offers the ability to have such rich
......
This diff is collapsed.
...@@ -144,9 +144,9 @@ gauge for how long the collection took in seconds and another for the number of ...@@ -144,9 +144,9 @@ gauge for how long the collection took in seconds and another for the number of
errors encountered. errors encountered.
This is one of the two cases when it is okay to export a duration as a gauge This is one of the two cases when it is okay to export a duration as a gauge
rather than a summary, the other being batch job durations. This is because both rather than a summary or a histogram, the other being batch job durations. This
represent information about that particular push/scrape, rather than is because both represent information about that particular push/scrape, rather
tracking multiple durations over time. than tracking multiple durations over time.
## Things to watch out for ## Things to watch out for
...@@ -191,10 +191,14 @@ processing system. ...@@ -191,10 +191,14 @@ processing system.
If you are unsure, start with no labels and add more If you are unsure, start with no labels and add more
labels over time as concrete use cases arise. labels over time as concrete use cases arise.
### Counter vs. gauge vs. summary ### Counter vs. gauge, summary vs. histogram
It is important to know which of the three main metric types to use for a given It is important to know which of the four main metric types to use for
metric. There is a simple rule of thumb: if the value can go down, it's a gauge. a given metric.
To pick between counter and gauge, there is a simple rule of thumb: if
-metric To pick between counter and gauge, there is a simple rule of
thumb: if the value can go down, it is a gauge.
Counters can only go up (and reset, such as when a process restarts). They are Counters can only go up (and reset, such as when a process restarts). They are
useful for accumulating the number of events, or the amount of something at useful for accumulating the number of events, or the amount of something at
...@@ -206,11 +210,8 @@ Gauges can be set, go up, and go down. They are useful for snapshots of state, ...@@ -206,11 +210,8 @@ Gauges can be set, go up, and go down. They are useful for snapshots of state,
such as in-progress requests, free/total memory, or temperature. You should such as in-progress requests, free/total memory, or temperature. You should
never take a `rate()` of a gauge. never take a `rate()` of a gauge.
Summaries are similar to having two counters. They track the number of events Summaries and histograms are more complex metric types discussed in
*and* the amount of something for each event, allowing you to calculate the [their own section](/docs/practices/histograms/).
average amount per event (useful for latency, for example). In addition,
summaries can also export quantiles of the amounts, but note that [quantiles are not
aggregatable](http://latencytipoftheday.blogspot.de/2014/06/latencytipoftheday-you-cant-average.html).
### Timestamps, not time since ### Timestamps, not time since
...@@ -244,9 +245,11 @@ benchmarks are the best way to determine the impact of any given change. ...@@ -244,9 +245,11 @@ benchmarks are the best way to determine the impact of any given change.
### Avoid missing metrics ### Avoid missing metrics
Time series that are not present until something happens are difficult to deal with, Time series that are not present until something happens are difficult
as the usual simple operations are no longer sufficient to correctly handle to deal with, as the usual simple operations are no longer sufficient
them. To avoid this, export a `0` for any time series you know may exist in advance. to correctly handle them. To avoid this, export `0` (or `NaN`, if `0`
would be misleading) for any time series you know may exist in
advance.
Most Prometheus client libraries (including Go and Java Simpleclient) will Most Prometheus client libraries (including Go and Java Simpleclient) will
automatically export a `0` for you for metrics with no labels. automatically export a `0` for you for metrics with no labels.
--- ---
title: Recording rules title: Recording rules
sort_rank: 5 sort_rank: 6
--- ---
# Recording rules # Recording rules
......
...@@ -107,6 +107,31 @@ a `job` label set to `prometheus`: ...@@ -107,6 +107,31 @@ a `job` label set to `prometheus`:
http_requests_total{job="prometheus"}[5m] http_requests_total{job="prometheus"}[5m]
### Offset modifier
The `offset` modifier allows changing the time offset for individual
instant and range vectors in a query.
For example, the following expression returns the value of
`http_requests_total` 5 minutes in the past relative to the current
query evaluation time:
http_requests_total offset 5m
Note that the `offset` modifier always needs to follow the selector
immediately, i.e. the following would be correct:
sum(http_requests_total{method="GET"} offset 5m) // GOOD.
While the following would be *incorrect*:
sum(http_requests_total{method="GET"}) offset 5m // INVALID.
The same works for range vectors. This returns the 5-minutes rate that
`http_requests_total` had a week ago:
rate(http_requests_total[5m] offset 1w)
## Operators ## Operators
Prometheus supports many binary and aggregation operators. These are described Prometheus supports many binary and aggregation operators. These are described
......
...@@ -28,6 +28,12 @@ the 1-element output vector from the input vector: ...@@ -28,6 +28,12 @@ the 1-element output vector from the input vector:
This is useful for alerting on when no time series This is useful for alerting on when no time series
exist for a given metric name and label combination. exist for a given metric name and label combination.
## `bottomk()`
`bottomk(k integer, v instant-vector)` returns the `k` smallest elements of `v`
by sample value.
## `ceil()` ## `ceil()`
`ceil(v instant-vector)` rounds the sample values of all elements in `v` up to `ceil(v instant-vector)` rounds the sample values of all elements in `v` up to
...@@ -80,6 +86,54 @@ and value across all series in the input vector. ...@@ -80,6 +86,54 @@ and value across all series in the input vector.
`floor(v instant-vector)` rounds the sample values of all elements in `v` down `floor(v instant-vector)` rounds the sample values of all elements in `v` down
to the nearest integer. to the nearest integer.
## `histogram_quantile()`
`histogram_quantile(φ float, b instant-vector)` calculates the
φ-quantile (0 ≤ φ ≤ 1) from the buckets `b` of a
[histogram](/docs/concepts/metric_types/#histogram). (See [histograms
and summaries](/docs/practices/histograms) for a detailed explanation
of φ-quantiles and the usage of the histogram metric type in general.)
The samples in `b` are the counts of observations in each bucket. Each
sample must have a label `le` where the label value denotes the
inclusive upper bound of the bucket. (Samples without such a label are
silently ignored.) The [histogram metric
type](/docs/concepts/metric_types/#histogram) automatically provides
time series with the `_bucket` suffix and the appropriate labels.
Use the `rate()` function to specify the time window for the quantile
calculation.
Example: A histogram metric is called `http_request_duration_seconds`. To
calculate the 90th percentile of request durations over the last 10m, use the
following expression:
histogram_quantile(0.9, rate(http_request_duration_seconds_bucket[10m]))
The quantile is calculated for each label combination in
`http_request_duration_seconds`. To aggregate, use the `sum()` aggregator
around the `rate()` function. Since the `le` label is required by
`histogram_quantile()`, it has to be included in the `by` clause. The following
expression aggregates the 90th percentile by `job`:
histogram_quantile(0.9, sum(rate(http_request_duration_seconds_bucket[10m])) by (job, le))
To aggregate everything, specify only the `le` label:
histogram_quantile(0.9, sum(rate(http_request_duration_seconds_bucket[10m])) by (le))
The `histogram_quantile()` function interpolates quantile values by
assuming a linear distribution within a bucket. The highest bucket
must have an upper bound of `+Inf`. (Otherwise, `NaN` is returned.) If
a quantile is located in the highest bucket, the upper bound of the
second highest bucket is returned. A lower limit of the lowest bucket
is assumed to be 0 if the upper bound of that bucket is greater than
0. In that case, the usual linear interpolation is applied within that
bucket. Otherwise, the upper bound of the lowest bucket is returned
for quantiles located in the lowest bucket.
If `b` contains fewer than two buckets, `NaN` is returned. For φ < 0, `-Inf` is
returned. For φ > 1, `+Inf` is returned.
## `rate()` ## `rate()`
`rate(v range-vector)` calculate the per-second average rate of increase of the `rate(v range-vector)` calculate the per-second average rate of increase of the
...@@ -123,6 +177,11 @@ Same as `sort`, but sorts in descending order. ...@@ -123,6 +177,11 @@ Same as `sort`, but sorts in descending order.
this does not actually return the current time, but the time at which the this does not actually return the current time, but the time at which the
expression is to be evaluated. expression is to be evaluated.
## `topk()`
`topk(k integer, v instant-vector)` returns the `k` largest elements of `v` by
sample value.
## `<aggregation>_over_time()`: Aggregating values over time: ## `<aggregation>_over_time()`: Aggregating values over time:
The following functions allow aggregating each series of a given range vector The following functions allow aggregating each series of a given range vector
...@@ -133,11 +192,3 @@ over time and return an instant vector with per-series aggregation results: ...@@ -133,11 +192,3 @@ over time and return an instant vector with per-series aggregation results:
* `max_over_time(range-vector)`: the maximum value of all points under the specified interval. * `max_over_time(range-vector)`: the maximum value of all points under the specified interval.
* `sum_over_time(range-vector)`: the sum of all values under the specified interval. * `sum_over_time(range-vector)`: the sum of all values under the specified interval.
* `count_over_time(range-vector)`: the count of all values under the specified interval. * `count_over_time(range-vector)`: the count of all values under the specified interval.
## `topk()` and `bottomk()`
`topk(k integer, v instant-vector)` returns the `k` largest elements of `v` by
sample value.
`bottomk(k integer, v instant-vector` returns the `k` smallest elements of `v`
by sample value.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment