Commit a072797f authored by Björn Rabenstein's avatar Björn Rabenstein

Merge pull request #62 from prometheus/beorn7/doc-improve

Document histograms in the exposition format and new storage flags.
parents e84b8ff9 bb591bd4
......@@ -45,7 +45,7 @@ Prometheus).
| **Optional HTTP `Content-Encoding`** | `gzip` | `gzip` |
| **Advantages** | <ul><li>Cross-platform</li><li>Size</li><li>Encoding and decoding costs</li><li>Strict schema</li><li>Supports concatenation and theoretically streaming (only server-side behavior would need to change)</li></ul> | <ul><li>Human-readable</li><li>Easy to assemble, especially for minimalistic cases (no nesting required)</li><li>Readable line by line (with the exception of type hints and docstrings)</li></ul> |
| **Limitations** | <ul><li>Not human-readable</li></ul> | <ul><li>Verbose</li><li>Types and docstrings not integral part of the syntax, meaning little-to-nonexistent metric contract validation</li><li>Parsing cost</li></ul>|
| **Supported metric primitives** | <ul><li>Summary</li><li>Gauge</li><li>Counter</li><li>Untyped</li></ul> | <ul><li>Summary</li><li>Gauge</li><li>Counter</li><li>Untyped</li></ul> |
| **Supported metric primitives** | <ul><li>Counter</li><li>Gauge</li><li>Histogram</li><li>Summary</li><li>Untyped</li></ul> | <ul><li>Counter</li><li>Gauge</li><li>Histogram</li><li>Summary</li><li>Untyped</li></ul> |
| **Compatibility** | Version `0.0.3` protocol buffers are also valid version `0.0.4` protocol buffers. | none |
### Text format details
......@@ -67,12 +67,13 @@ characters have to be escaped as `\\` and `\n`, respectively. Only one `HELP`
line may exist for the same metric name.
If the token is `TYPE`, exactly two more tokens are expected. The first is the
metric name, and the second is either `counter`, `gauge`, `summary`, or
`untyped`, defining the type for the metric of that name. Only one `TYPE` line
may exist for the same metric name. The `TYPE` line for a metric name has to
appear before the first sample is reported for that metric name. If there is no
`TYPE` line for a metric name, the type is set to `untyped`. Remaining lines
describe samples, one per line, with the following syntax (EBNF):
metric name, and the second is either `counter`, `gauge`, `histogram`,
`summary`, or `untyped`, defining the type for the metric of that name. Only
one `TYPE` line may exist for the same metric name. The `TYPE` line for a
metric name has to appear before the first sample is reported for that metric
name. If there is no `TYPE` line for a metric name, the type is set to
`untyped`. Remaining lines describe samples, one per line, with the following
syntax (EBNF):
metric_name [
"{" label_name "=" `"` label_value `"` { "," label_name "=" `"` label_value `"` } [ "," ] "}"
......@@ -81,11 +82,15 @@ describe samples, one per line, with the following syntax (EBNF):
`metric_name` and `label_name` have the usual Prometheus expression language restrictions. `label_value` can be any sequence of UTF-8 characters, but the backslash, the double-quote, and the line-feed characters have to be escaped as `\\`, `\"`, and `\n`, respectively.
`value` is a float, and timestamp an `int64` (milliseconds since epoch, i.e. 1970-01-01 00:00:00 UTC, excluding leap seconds), represented as required by the [Go strconv package](http://golang.org/pkg/strconv/) (see functions `ParseInt` and `ParseFloat`). In particular, `Nan`, `+Inf`, and `-Inf` are valid values.
The `summary` type is difficult to represent in the text format. The following conventions apply:
The `histogram` and `summary` types are difficult to represent in the text
format. The following conventions apply:
* Each quantile `x` is given as a separate sample, each with a label `{quantile="x"}`.
* The sample sum for a summary named `x` is given as a separate sample named `x_sum`.
* The sample count for a summary named `x` is given as a separate sample named `x_count`.
* The sample sum for a summary or histogram named `x` is given as a separate sample named `x_sum`.
* The sample count for a summary or histogram named `x` is given as a separate sample named `x_count`.
* Each quantile of a summary named `x` is given as a separate sample line with the same name `x` and a label `{quantile="y"}`.
* Each bucket count of a histogram named `x` is given as a separate sample line with the name `x_bucket` and a label `{le="y"}` (where `y` is the upper bound of the bucket).
* A histogram _must_ have a bucket with `{le="+Inf"}`. Its value _must_ be identical to the value of `x_count`.
* The buckets of a histogram and the quantiles of a summary must appear in increasing numerical order of their label values (for the `le` or the `quantile` label, respectively).
See also the example below.
......@@ -104,8 +109,20 @@ metric_without_timestamp_and_labels 12.47
# A weird metric from before the epoch:
something_weird{problem="division by zero"} +Inf -3982045
# Finally a summary, which has a pretty complex representation in the text format:
# HELP telemetry_requests_metrics_latency_microseconds A histogram of the response latency.
# A histogram, which has a pretty complex representation in the text format:
# HELP http_request_duration_seconds A histogram of the request duration.
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{le="0.05"} 24054
http_request_duration_seconds_bucket{le="0.1"} 33444
http_request_duration_seconds_bucket{le="0.2"} 100392
http_request_duration_seconds_bucket{le="0.5"} 129389
http_request_duration_seconds_bucket{le="1"} 133988
http_request_duration_seconds_bucket{le="+Inf"} 144320
http_request_duration_seconds_sum 53423
http_request_duration_seconds_count 144320
# Finally a summary, which has a complex representation, too:
# HELP telemetry_requests_metrics_latency_microseconds A summary of the response latency.
# TYPE telemetry_requests_metrics_latency_microseconds summary
telemetry_requests_metrics_latency_microseconds{quantile="0.01"} 3102
telemetry_requests_metrics_latency_microseconds{quantile="0.05"} 3272
......
......@@ -50,6 +50,25 @@ likely not what you want for actual operations. The flag
`storage.local.retention` allows you to configure the retention time
for samples. Adjust it to your needs and your available disk space.
## Settings for high numbers of time series
Prometheus can handle millions of time series. However, you have to
adjust the storage settings for that. Essentially, you want to allow a
certain number of chunks for each time series to be kept in RAM. The
default value for the `storage.local.memory-chunks` flag (discussed
above) is 1048576. Up to about 300,000 series, you still have three
chunks available per series on average. For more series, you should
increase the `storage.local.memory-chunks` value. Three times the
number of series is a good first approximation. But keep the
implication for memory usage (see above) in mind.
Even more important is raising the value for the
`storage.local.max-chunks-to-persist` flag at the same time. As a rule
of thumb, keep it somewhere between 50% and 100% of the
`storage.local.memory-chunks` value. The main drawback of a high value
is larger checkpoints. The consequences of a value too low are much
more serious.
## Crash recovery
Prometheus saves chunks to disk as soon as possible after they are
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment