Commit 4f3658e9 authored by beorn7's avatar beorn7

Various documentation improvments.

- FAQ about timezone.
- FAQs about implementation.
- Fix swapped issue URLs.
- Add section about possible hash collisions.
- Mention `GOMAXPROCS`, especially important for Go beginners.
- Mention possible storage optimization for histograms.
parent a7166d49
......@@ -89,6 +89,16 @@ Yes, with the experimental [Alertmanager](https://github.com/prometheus/alertman
Yes, with [PromDash](/docs/visualization/promdash/) and [Console
templates](/docs/visualization/consoles/).
### Can I change the timezone? Why is everything in UTC?
To avoid any kind of timezone confusion, especially when the so-called
daylight saving time is involved, we decided to exclusively use Unix
time internally and UTC for display purposes in all components of
Prometheus. A carefully done timezone selection could be introduced
into the UI. Contributions are welcome. See
[issue #500](https://github.com/prometheus/prometheus/issues/500)
for the current state of this effort.
## Instrumentation
### Which languages have instrumentation libraries?
......@@ -153,3 +163,53 @@ should take less than a minute under normal circumstances. See [crash recovery](
You have run into a bug of ZFS on Linux. See [issue #484](https://github.com/prometheus/prometheus/issues/484)
for details. Upgrading to ZFS on Linux v0.6.4 should fix the issue.
## Implementation
### Why are all sample values 64-bit floats? I want integers.
We restrained ourselves to 64-bit floats to simplify the design. The
[IEEE 754 double-precision binary floating-point
format](http://en.wikipedia.org/wiki/Double-precision_floating-point_format)
supports integer precision for values up to 2<sup>53</sup>. Supporting
native 64 bit integers would (only) help if you need integer precision
above 2<sup>53</sup> but below 2<sup>63</sup>. In principle, support
for different sample value types (including some kind of big integer,
supporting even more than 64 bit) could be implemented, but it is not
a priority right now. Note that a counter, even if incremented
one million times per second, will only run into precision issues
after over 285 years.
### Why does Prometheus use a custom storage backend rather than [some other storage method]? Isn't the "one file per time series" approach killing performance?
Initially, Prometheus ran completely on LevelDB, but to achieve better
performance, we had to change the storage for bulk sample data. We
evaluated many storage backends that were available at the time,
without getting satisfactory results. So we implemented exactly the
parts we needed, while keeping LevelDB for indexes and making heavy
use of file system capabilities. Obviously, we could not evaluate
every single storage backend out there, and storage backends have
evolved meanwhile. However, the performance of the solution
implemented now is satisfactory for most use-cases. Our most important
requirements are an acceptable query speed for common queries and a
sustainable ingestion rate of many thousands of samples per
second. The latter depends on the compressibility of the sample data
and on the number of time series the samples belong to, but to give
you an idea, here are some results from benchmarks:
* On an older 8-core machine with Intel Core i7 CPUs and two spinning
disks (Samsung HD753LJ) in a RAID-1 setup, Prometheus sustained an
ingestion rate of 20k samples per second, belonging to 450k time
series, scraped from 1500 targets.
* On a modern server with SSD, Prometheus sustained an ingestion rate
of more than 100k samples per second, belonging to millions of time
series, scraped from thousands of targets.
In both cases, the bottleneck was identified as insufficiently
parallelized hash calculation, which happens before samples even hit
the storage backend.
Running out of inodes is highly unlikely in a usual set-up. There is a
possible downside: If you want to delete Prometheus's storage
directory, you will notice that some file systems are very slow when
deleting files.
......@@ -93,6 +93,20 @@ about itself from its own HTTP metrics endpoint.
You can also verify that Prometheus is serving metrics about itself by
navigating to its metrics exposure endpoint: http://localhost:9090/metrics
By default, Prometheus will only execute at most one OS thread at a
time. In production scenarios on multi-CPU machines, you will most
likely achieve better performance by setting the `GOMAXPROCS`
environment variable to a value similar to the number of available CPU
cores:
```language-bash
GOMAXPROCS=8 ./prometheus -config.file=prometheus.conf
```
Blindly setting `GOMAXPROCS` to a high value can be
counterproductive. See the relevant [Go
FAQs](http://golang.org/doc/faq#Why_no_multi_CPU).
## Using the expression browser
Let's try looking at some data that Prometheus has collected about itself. To
......
......@@ -21,7 +21,7 @@ of global Prometheus servers which collect and store only aggregated data from
those local servers. This allows you to have an aggregate global view and
detailed local views.
GitHub issue: [#480](https://github.com/prometheus/prometheus/issues/480)
GitHub issue: [#9](https://github.com/prometheus/prometheus/issues/9)
**Aggregatable histograms**
......@@ -32,7 +32,7 @@ incorrect](http://latencytipoftheday.blogspot.de/2014/06/latencytipoftheday-you-
to average over the 90th percentile latency of multiple monitored instances.
We plan to implement server-side histograms which will allow for this use case.
GitHub issue: [#9](https://github.com/prometheus/prometheus/issues/9)
GitHub issue: [#480](https://github.com/prometheus/prometheus/issues/480)
**More flexible label matching in binary operations**
......@@ -86,12 +86,15 @@ GitHub issue: [#398](https://github.com/prometheus/prometheus/issues/398)
**Server-side metric metadata support**
At this time, metric types and other metadata are only used in the client
libaries and in the exposition format, but not persisted or utilized in the
Prometheus server. We plan on making use of this metadata in the future. For
example, we could suggest automatic rates over counters, warn users if they
take the rate of a gauge, or display metric documentation strings. The details
of this are still to be determined.
At this time, metric types and other metadata are only used in the
client libaries and in the exposition format, but not persisted or
utilized in the Prometheus server. We plan on making use of this
metadata in the future. For example, we could suggest automatic rates
over counters, warn users if they take the rate of a gauge, or display
metric documentation strings. Some metric types, like the upcoming
[server-side histograms](https://github.com/prometheus/prometheus/issues/480),
could also be stored and processed in a more efficient way. The
details of this are still to be determined.
**More client libraries and exporters**
......
......@@ -86,3 +86,16 @@ storage directory:
1. Stop Prometheus.
1. `rm -r <storage path>/*`
1. Start Prometheus.
## Hash collisions
Prometheus currently uses 64-bit fingerprints to identify time
series. On a large server with several million time series, the chance
of a hash collision is about one to one million (assuming the FNV-1a
hash function works well). While that might appear safe enough, the
problem is that a hash collision will effectively lead to undetected
data corruption. Also, with more powerful hardware and future
improvements of the Prometheus code, much higher numbers of time
series might be handled by a single server, increasing the chance of a
collision. See [Prometheus issue #509] for efforts to deal with the
problem.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment