Commit 7ea0ccf4 authored by James Turnbull's avatar James Turnbull Committed by Brian Brazil

Some updates to the introduction documents (#861)

* Some updates to the introduction documents

1. Some spelling and grammar updates, settled on US English as that
seemed consistent? Happy to be wrong...

2. Removed a dead link from the FAQ.

3. Tidied up some awkward sentences.

4. Fixed some missing or extra words.

5. Added a couple of links to external tools, etc.

6. Added NOTE formatting when things are an actual node.
parent c984fe6a
...@@ -5,9 +5,8 @@ sort_rank: 3 ...@@ -5,9 +5,8 @@ sort_rank: 3
# Jobs and instances # Jobs and instances
In Prometheus terms, any individually scraped target is called an _instance_, In Prometheus terms, an endpoint you can scrape is called an _instance_,
usually corresponding to a single process. A collection of instances of the usually corresponding to a single process. A collection of instances with the same purpose, a process replicated for scalability or reliability for example, is called a _job_.
same type (replicated for scalability or reliability) is called a _job_.
For example, an API server job with four replicated instances: For example, an API server job with four replicated instances:
......
...@@ -24,17 +24,17 @@ to find faults. ...@@ -24,17 +24,17 @@ to find faults.
Graphite stores numeric samples for named time series, much like Prometheus Graphite stores numeric samples for named time series, much like Prometheus
does. However, Prometheus's metadata model is richer: while Graphite metric does. However, Prometheus's metadata model is richer: while Graphite metric
names consist of dot-separated components which implicitly encode dimensions, names consist of dot-separated components which implicitly encode dimensions,
Prometheus encodes dimensions explicitly as key-value pairs (labels) attached Prometheus encodes dimensions explicitly as key-value pairs, called labels, attached
to a metric name. This allows easy filtering, grouping, and matching by these to a metric name. This allows easy filtering, grouping, and matching by these
labels via in the query language. labels via the query language.
Further, especially when Graphite is used in combination with Further, especially when Graphite is used in combination with
[StatsD](https://github.com/etsy/statsd/), it is common to store only [StatsD](https://github.com/etsy/statsd/), it is common to store only
aggregated data over all monitored instances, rather than preserving the aggregated data over all monitored instances, rather than preserving the
instance as a dimension and being able to drill down into individual instance as a dimension and being able to drill down into individual
problematic ones. problematic instances.
As an example, storing the number of HTTP requests to API servers with the For example, storing the number of HTTP requests to API servers with the
response code `500` and the method `POST` to the `/tracks` endpoint would response code `500` and the method `POST` to the `/tracks` endpoint would
commonly be encoded like this in Graphite/StatsD: commonly be encoded like this in Graphite/StatsD:
...@@ -102,20 +102,18 @@ silencing functionality. ...@@ -102,20 +102,18 @@ silencing functionality.
### Data model / storage ### Data model / storage
Like Prometheus, the InfluxDB data model has key-value pairs as labels, which Like Prometheus, the InfluxDB data model has key-value pairs as labels, which
are called tags. In addition InfluxDB has a second level of labels called are called tags. In addition, InfluxDB has a second level of labels called
fields, which are more limited in use. InfluxDB supports timestamps with up to fields, which are more limited in use. InfluxDB supports timestamps with up to
nanosecond resolution, and float64, int64, bool, and string data types. nanosecond resolution, and float64, int64, bool, and string data types.
Prometheus by contrast supports the float64 data type with limited support for Prometheus, by contrast, supports the float64 data type with limited support for
strings, and millisecond resolution timestamps. strings, and millisecond resolution timestamps.
InfluxDB uses a variant of a [log-structured merge tree for storage with a InfluxDB uses a variant of a [log-structured merge tree for storage with a write ahead log](https://docs.influxdata.com/influxdb/v1.2/concepts/storage_engine/),
write ahead log](https://docs.influxdata.com/influxdb/v1.2/concepts/storage_engine/),
sharded by time. This is much more suitable to event logging than Prometheus's sharded by time. This is much more suitable to event logging than Prometheus's
append-only file per time series approach. append-only file per time series approach.
[Logs and Metrics and Graphs, Oh My!](https://blog.raintank.io/logs-and-metrics-and-graphs-oh-my/) [Logs and Metrics and Graphs, Oh My!](https://blog.raintank.io/logs-and-metrics-and-graphs-oh-my/)
describes the difference between event logging and metrics recording. describes the differences between event logging and metrics recording.
### Architecture ### Architecture
...@@ -123,7 +121,7 @@ Prometheus servers run independently of each other and only rely on their local ...@@ -123,7 +121,7 @@ Prometheus servers run independently of each other and only rely on their local
storage for their core functionality: scraping, rule processing, and alerting. storage for their core functionality: scraping, rule processing, and alerting.
The open source version of InfluxDB is similar. The open source version of InfluxDB is similar.
The commercial InfluxDB offering is by design a distributed storage cluster The commercial InfluxDB offering is, by design, a distributed storage cluster
with storage and queries being handled by many nodes at once. with storage and queries being handled by many nodes at once.
This means that the commercial InfluxDB will be easier to scale horizontally, This means that the commercial InfluxDB will be easier to scale horizontally,
...@@ -136,7 +134,7 @@ you better reliability and failure isolation. ...@@ -136,7 +134,7 @@ you better reliability and failure isolation.
Kapacitor currently has no [built-in distributed/redundant Kapacitor currently has no [built-in distributed/redundant
options](https://github.com/influxdata/kapacitor/issues/277) for rules, options](https://github.com/influxdata/kapacitor/issues/277) for rules,
alerting or notifications. Prometheus and the Alertmanager by contrast offer a alerting, or notifications. Prometheus and the Alertmanager by contrast offer a
redundant option via running redundant replicas of Prometheus and using the redundant option via running redundant replicas of Prometheus and using the
Alertmanager's [High Alertmanager's [High
Availability](https://github.com/prometheus/alertmanager#high-availability) Availability](https://github.com/prometheus/alertmanager#high-availability)
...@@ -149,7 +147,7 @@ There are many similarities between the systems. Both have labels (called tags ...@@ -149,7 +147,7 @@ There are many similarities between the systems. Both have labels (called tags
in InfluxDB) to efficiently support multi-dimensional metrics. Both use in InfluxDB) to efficiently support multi-dimensional metrics. Both use
basically the same data compression algorithms. Both have extensive basically the same data compression algorithms. Both have extensive
integrations, including with each other. Both have hooks allowing you to extend integrations, including with each other. Both have hooks allowing you to extend
them further, such as analysing data in statistical tools or performing them further, such as analyzing data in statistical tools or performing
automated actions. automated actions.
Where InfluxDB is better: Where InfluxDB is better:
...@@ -183,12 +181,10 @@ The same scope differences as in the case of ...@@ -183,12 +181,10 @@ The same scope differences as in the case of
### Data model ### Data model
OpenTSDB's data model is almost identical to Prometheus's: time series are OpenTSDB's data model is almost identical to Prometheus's: time series are
identified by a set of arbitrary key-value pairs (OpenTSDB "tags" are identified by a set of arbitrary key-value pairs (OpenTSDB tags are
Prometheus "labels"). All data for a metric is [stored Prometheus labels). All data for a metric is [stored together](http://opentsdb.net/docs/build/html/user_guide/writing/index.html#time-series-cardinality),
together](http://opentsdb.net/docs/build/html/user_guide/writing/index.html#time-series-cardinality), limiting the cardinality of metrics. There are minor differences though: Prometheus allows arbitrary characters in label values, while
limiting the cardinality of metrics. There are minor differences though, OpenTSDB is more restrictive. OpenTSDB also lacks a full query language,
such as that Prometheus allows arbitrary characters in label values, while
OpenTSDB is more restrictive. OpenTSDB is also lacking a full query language,
only allowing simple aggregation and math via its API. only allowing simple aggregation and math via its API.
### Storage ### Storage
...@@ -204,15 +200,14 @@ once the capacity of a single node is exceeded. ...@@ -204,15 +200,14 @@ once the capacity of a single node is exceeded.
### Summary ### Summary
Prometheus offers a much richer query language, can handle higher cardinality Prometheus offers a much richer query language, can handle higher cardinality
metrics and forms part of a complete monitoring system. If you're already metrics, and forms part of a complete monitoring system. If you're already
running Hadoop and value long term storage over these benefits, OpenTSDB is a running Hadoop and value long term storage over these benefits, OpenTSDB is a
good choice. good choice.
## Prometheus vs. Nagios ## Prometheus vs. Nagios
[Nagios](https://www.nagios.org/) is a monitoring system that originated in the [Nagios](https://www.nagios.org/) is a monitoring system that originated in the
90s as NetSaint. 1990s as NetSaint.
### Scope ### Scope
...@@ -220,14 +215,11 @@ Nagios is primarily about alerting based on the exit codes of scripts. These are ...@@ -220,14 +215,11 @@ Nagios is primarily about alerting based on the exit codes of scripts. These are
There is silencing of individual alerts, however no grouping, routing or deduplication. There is silencing of individual alerts, however no grouping, routing or deduplication.
There are a variety of plugins. For example, piping the few kilobytes of There are a variety of plugins. For example, piping the few kilobytes of
perfData plugins are allowed to return [to a time series database such as perfData plugins are allowed to return [to a time series database such as Graphite](https://github.com/shawn-sterling/graphios) or using NRPE to [run checks on remote machines](https://exchange.nagios.org/directory/Addons/Monitoring-Agents/NRPE--2D-Nagios-Remote-Plugin-Executor/details).
Graphite](https://github.com/shawn-sterling/graphios) or using NRPE to [run
checks on remote
machines](https://exchange.nagios.org/directory/Addons/Monitoring-Agents/NRPE--2D-Nagios-Remote-Plugin-Executor/details).
### Data model ### Data model
Nagios is host-based. Each host can have one or more services, which has one check. Nagios is host-based. Each host can have one or more services and each service can perform one check.
There is no notion of labels or a query language. There is no notion of labels or a query language.
...@@ -246,7 +238,7 @@ Nagios is suitable for basic monitoring of small and/or static systems where ...@@ -246,7 +238,7 @@ Nagios is suitable for basic monitoring of small and/or static systems where
blackbox probing is sufficient. blackbox probing is sufficient.
If you want to do whitebox monitoring, or have a dynamic or cloud based If you want to do whitebox monitoring, or have a dynamic or cloud based
environment then Prometheus is a good choice. environment, then Prometheus is a good choice.
## Prometheus vs. Sensu ## Prometheus vs. Sensu
...@@ -257,8 +249,7 @@ environment then Prometheus is a good choice. ...@@ -257,8 +249,7 @@ environment then Prometheus is a good choice.
The same general scope differences as in the case of The same general scope differences as in the case of
[Nagios](/docs/introduction/comparison/#prometheus-vs-nagios) apply here. [Nagios](/docs/introduction/comparison/#prometheus-vs-nagios) apply here.
The primary difference is that Sensu clients [register The primary difference is that Sensu clients [register themselves](https://sensuapp.org/docs/0.27/reference/clients.html#what-is-a-sensu-client),
themselves](https://sensuapp.org/docs/0.27/reference/clients.html#what-is-a-sensu-client),
and can determine the checks to run either from central or local configuration. and can determine the checks to run either from central or local configuration.
Sensu does not have a limit on the amount of perfData. Sensu does not have a limit on the amount of perfData.
...@@ -275,9 +266,8 @@ silences. It also stores all the clients that have registered with it. ...@@ -275,9 +266,8 @@ silences. It also stores all the clients that have registered with it.
### Architecture ### Architecture
Sensu has a [number of Sensu has a [number of components](https://sensuapp.org/docs/0.27/overview/architecture.html). It uses
components](https://sensuapp.org/docs/0.27/overview/architecture.html). It uses RabbitMQ as a transport, Redis for current state, and a separate server for
RabbitMQ as a transport, Redis for current state, and a separate Server for
processing. processing.
Both RabbitMQ and Redis can be clustered. Multiple copies of the server can be Both RabbitMQ and Redis can be clustered. Multiple copies of the server can be
...@@ -285,8 +275,7 @@ run for scaling and redundancy. ...@@ -285,8 +275,7 @@ run for scaling and redundancy.
### Summary ### Summary
If you have an existing Nagios setup that you wish to scale as-is or taking If you have an existing Nagios setup that you wish to scale as-is, or want to take advantage of the registration feature of Sensu, then Sensu is a good choice.
advantage of the registration feature of Sensu, then Sensu is a good choice.
If you want to do whitebox monitoring, or have a very dynamic or cloud based If you want to do whitebox monitoring, or have a very dynamic or cloud based
environment, then Prometheus is a good choice. environment, then Prometheus is a good choice.
...@@ -9,6 +9,7 @@ toc: full-width ...@@ -9,6 +9,7 @@ toc: full-width
## General ## General
### What is Prometheus? ### What is Prometheus?
Prometheus is an open-source systems monitoring and alerting toolkit Prometheus is an open-source systems monitoring and alerting toolkit
with an active ecosystem. See the [overview](/docs/introduction/overview/). with an active ecosystem. See the [overview](/docs/introduction/overview/).
...@@ -49,7 +50,7 @@ version 1.0.0 broadly follow ...@@ -49,7 +50,7 @@ version 1.0.0 broadly follow
increments of the major version. Exceptions are possible for experimental increments of the major version. Exceptions are possible for experimental
components, which are clearly marked as such in announcements. components, which are clearly marked as such in announcements.
Even repositories that have not yet reached version 1.0.0 are in general quite Even repositories that have not yet reached version 1.0.0 are, in general, quite
stable. We aim for a proper release process and an eventual 1.0.0 release for stable. We aim for a proper release process and an eventual 1.0.0 release for
each repository. In any case, breaking changes will be pointed out in release each repository. In any case, breaking changes will be pointed out in release
notes (marked by `[CHANGE]`) or communicated clearly for components that do not notes (marked by `[CHANGE]`) or communicated clearly for components that do not
...@@ -63,17 +64,14 @@ Pulling over HTTP offers a number of advantages: ...@@ -63,17 +64,14 @@ Pulling over HTTP offers a number of advantages:
* You can more easily tell if a target is down. * You can more easily tell if a target is down.
* You can manually go to a target and inspect its health with a web browser. * You can manually go to a target and inspect its health with a web browser.
Overall we believe that pulling is slightly better than pushing, but it should Overall, we believe that pulling is slightly better than pushing, but it should
not be considered a major point when considering a monitoring system. not be considered a major point when considering a monitoring system.
The [Push vs Pull for Monitoring](http://www.boxever.com/push-vs-pull-for-monitoring)
blog post by Brian Brazil goes into more detail.
For cases where you must push, we offer the [Pushgateway](/docs/instrumenting/pushing/). For cases where you must push, we offer the [Pushgateway](/docs/instrumenting/pushing/).
### How to feed logs into Prometheus? ### How to feed logs into Prometheus?
Short answer: Don't! Use something like the ELK stack instead. Short answer: Don't! Use something like the [ELK stack](https://www.elastic.co/products) instead.
Longer answer: Prometheus is a system to collect and process metrics, not an Longer answer: Prometheus is a system to collect and process metrics, not an
event logging system. The Raintank blog post event logging system. The Raintank blog post
...@@ -104,7 +102,7 @@ that the correct plural of 'Prometheus' is 'Prometheis'. ...@@ -104,7 +102,7 @@ that the correct plural of 'Prometheus' is 'Prometheis'.
### Can I reload Prometheus's configuration? ### Can I reload Prometheus's configuration?
Yes, sending SIGHUP to the Prometheus process or an HTTP POST request to the Yes, sending `SIGHUP` to the Prometheus process or an HTTP POST request to the
`/-/reload` endpoint will reload and apply the configuration file. The `/-/reload` endpoint will reload and apply the configuration file. The
various components attempt to handle failing changes gracefully. various components attempt to handle failing changes gracefully.
...@@ -152,7 +150,7 @@ the [exposition formats](/docs/instrumenting/exposition_formats/). ...@@ -152,7 +150,7 @@ the [exposition formats](/docs/instrumenting/exposition_formats/).
Yes, the [Node Exporter](https://github.com/prometheus/node_exporter) exposes Yes, the [Node Exporter](https://github.com/prometheus/node_exporter) exposes
an extensive set of machine-level metrics on Linux and other Unix systems such an extensive set of machine-level metrics on Linux and other Unix systems such
as CPU usage, memory, disk utilization, filesystem fullness and network as CPU usage, memory, disk utilization, filesystem fullness, and network
bandwidth. bandwidth.
### Can I monitor network devices? ### Can I monitor network devices?
...@@ -172,8 +170,7 @@ See [the list of exporters and integrations](/docs/instrumenting/exporters/). ...@@ -172,8 +170,7 @@ See [the list of exporters and integrations](/docs/instrumenting/exporters/).
### Can I monitor JVM applications via JMX? ### Can I monitor JVM applications via JMX?
Yes, for applications that you cannot instrument directly with the Java client Yes, for applications that you cannot instrument directly with the Java client, you can use the [JMX Exporter](https://github.com/prometheus/jmx_exporter)
you can use the [JMX Exporter](https://github.com/prometheus/jmx_exporter)
either standalone or as a Java Agent. either standalone or as a Java Agent.
### What is the performance impact of instrumentation? ### What is the performance impact of instrumentation?
...@@ -219,9 +216,8 @@ native 64 bit integers would (only) help if you need integer precision ...@@ -219,9 +216,8 @@ native 64 bit integers would (only) help if you need integer precision
above 2<sup>53</sup> but below 2<sup>63</sup>. In principle, support above 2<sup>53</sup> but below 2<sup>63</sup>. In principle, support
for different sample value types (including some kind of big integer, for different sample value types (including some kind of big integer,
supporting even more than 64 bit) could be implemented, but it is not supporting even more than 64 bit) could be implemented, but it is not
a priority right now. Note that a counter, even if incremented a priority right now. A counter, even if incremented one million times per
one million times per second, will only run into precision issues second, will only run into precision issues after over 285 years.
after over 285 years.
### Why does Prometheus use a custom storage backend rather than [some other storage method]? Isn't the "one file per time series" approach killing performance? ### Why does Prometheus use a custom storage backend rather than [some other storage method]? Isn't the "one file per time series" approach killing performance?
...@@ -239,8 +235,7 @@ latter depends on many parameters, like the compressibility of the sample data, ...@@ -239,8 +235,7 @@ latter depends on many parameters, like the compressibility of the sample data,
the number of time series the samples belong to, the retention policy, and even the number of time series the samples belong to, the retention policy, and even
more subtle aspects like how full your SSD is. If you want to know all the more subtle aspects like how full your SSD is. If you want to know all the
details, read details, read
[this document with detailed benchmark results](https://docs.google.com/document/d/1lRKBaz9oXI5nwFZfvSbPhpwzUbUr3-9qryQGG1C6ULk/edit?usp=sharing). The [this document with detailed benchmark results](https://docs.google.com/document/d/1lRKBaz9oXI5nwFZfvSbPhpwzUbUr3-9qryQGG1C6ULk/edit?usp=sharing). The highlights:
highlights:
* On a typical bare-metal server with 64GiB RAM, 32 CPU cores, and SSD, * On a typical bare-metal server with 64GiB RAM, 32 CPU cores, and SSD,
Prometheus sustained an ingestion rate of 900k samples per second, belonging Prometheus sustained an ingestion rate of 900k samples per second, belonging
...@@ -266,10 +261,9 @@ monitoring system possible rather than supporting fully generic TLS and ...@@ -266,10 +261,9 @@ monitoring system possible rather than supporting fully generic TLS and
authentication solutions in every server component. authentication solutions in every server component.
If you need TLS or authentication, we recommend putting a reverse proxy in If you need TLS or authentication, we recommend putting a reverse proxy in
front of Prometheus. See for example [Adding Basic Auth to Prometheus with front of Prometheus. See, for example [Adding Basic Auth to Prometheus with
Nginx](https://www.robustperception.io/adding-basic-auth-to-prometheus-with-nginx/). Nginx](https://www.robustperception.io/adding-basic-auth-to-prometheus-with-nginx/).
Note that this applies only to inbound connections. Prometheus does support This applies only to inbound connections. Prometheus does support
[scraping TLS- and auth-enabled [scraping TLS- and auth-enabled targets](/docs/operating/configuration/#%3Cscrape_config%3E), and other
targets](/docs/operating/configuration/#%3Cscrape_config%3E), and other
Prometheus components that create outbound connections have similar support. Prometheus components that create outbound connections have similar support.
...@@ -16,7 +16,7 @@ series data. ...@@ -16,7 +16,7 @@ series data.
[Download the latest release](/download) of Prometheus for your platform, then [Download the latest release](/download) of Prometheus for your platform, then
extract and run it: extract and run it:
``` ```language-bash
tar xvfz prometheus-*.tar.gz tar xvfz prometheus-*.tar.gz
cd prometheus-* cd prometheus-*
``` ```
...@@ -33,7 +33,7 @@ While a Prometheus server that collects only data about itself is not very ...@@ -33,7 +33,7 @@ While a Prometheus server that collects only data about itself is not very
useful in practice, it is a good starting example. Save the following basic useful in practice, it is a good starting example. Save the following basic
Prometheus configuration as a file named `prometheus.yml`: Prometheus configuration as a file named `prometheus.yml`:
``` ```language-yaml
global: global:
scrape_interval: 15s # By default, scrape targets every 15 seconds. scrape_interval: 15s # By default, scrape targets every 15 seconds.
...@@ -58,11 +58,9 @@ scrape_configs: ...@@ -58,11 +58,9 @@ scrape_configs:
For a complete specification of configuration options, see the For a complete specification of configuration options, see the
[configuration documentation](/docs/operating/configuration). [configuration documentation](/docs/operating/configuration).
## Starting Prometheus ## Starting Prometheus
To start Prometheus with your newly created configuration file, change to your To start Prometheus with your newly created configuration file, change to the directory containing the Prometheus binary and run:
Prometheus build directory and run:
```language-bash ```language-bash
# Start Prometheus. # Start Prometheus.
...@@ -70,9 +68,7 @@ Prometheus build directory and run: ...@@ -70,9 +68,7 @@ Prometheus build directory and run:
./prometheus -config.file=prometheus.yml ./prometheus -config.file=prometheus.yml
``` ```
Prometheus should start up and it should show a status page about itself at Prometheus should start up. You should also be able to browse to a status page about itself at http://localhost:9090. Give it a couple of seconds to collect data about itself from its own HTTP metrics endpoint.
http://localhost:9090. Give it a couple of seconds to collect data about itself
from its own HTTP metrics endpoint.
You can also verify that Prometheus is serving metrics about itself by You can also verify that Prometheus is serving metrics about itself by
navigating to its metrics endpoint: http://localhost:9090/metrics navigating to its metrics endpoint: http://localhost:9090/metrics
...@@ -81,11 +77,9 @@ The number of OS threads executed by Prometheus is controlled by the ...@@ -81,11 +77,9 @@ The number of OS threads executed by Prometheus is controlled by the
`GOMAXPROCS` environment variable. As of Go 1.5 the default value is `GOMAXPROCS` environment variable. As of Go 1.5 the default value is
the number of cores available. the number of cores available.
Blindly setting `GOMAXPROCS` to a high value can be Blindly setting `GOMAXPROCS` to a high value can be counterproductive. See the relevant [Go FAQs](http://golang.org/doc/faq#Why_no_multi_CPU).
counterproductive. See the relevant [Go
FAQs](http://golang.org/doc/faq#Why_no_multi_CPU).
Note that Prometheus by default uses around 3GB in memory. If you have a Prometheus by default uses around 3GB in memory. If you have a
smaller machine, you can tune Prometheus to use less memory. For details, smaller machine, you can tune Prometheus to use less memory. For details,
see the [memory usage documentation](/docs/operating/storage/#memory-usage). see the [memory usage documentation](/docs/operating/storage/#memory-usage).
...@@ -105,7 +99,7 @@ target scrapes). Go ahead and enter this into the expression console: ...@@ -105,7 +99,7 @@ target scrapes). Go ahead and enter this into the expression console:
prometheus_target_interval_length_seconds prometheus_target_interval_length_seconds
``` ```
This should return a lot of different time series (along with the latest value This should return a number of different time series (along with the latest value
recorded for each), all with the metric name recorded for each), all with the metric name
`prometheus_target_interval_length_seconds`, but with different labels. These `prometheus_target_interval_length_seconds`, but with different labels. These
labels designate different latency percentiles and target group intervals. labels designate different latency percentiles and target group intervals.
...@@ -155,7 +149,7 @@ correct `GOPATH`) set up. ...@@ -155,7 +149,7 @@ correct `GOPATH`) set up.
Download the Go client library for Prometheus and run three of these example Download the Go client library for Prometheus and run three of these example
processes: processes:
```bash ```language-bash
# Fetch the client library code and compile example. # Fetch the client library code and compile example.
git clone https://github.com/prometheus/client_golang.git git clone https://github.com/prometheus/client_golang.git
cd client_golang/examples/random cd client_golang/examples/random
...@@ -231,10 +225,10 @@ job_service:rpc_durations_seconds_count:avg_rate5m = avg(rate(rpc_durations_seco ...@@ -231,10 +225,10 @@ job_service:rpc_durations_seconds_count:avg_rate5m = avg(rate(rpc_durations_seco
``` ```
To make Prometheus pick up this new rule, add a `rule_files` statement to the To make Prometheus pick up this new rule, add a `rule_files` statement to the
global configuration section in your `prometheus.yml`. The config should now `global` configuration section in your `prometheus.yml`. The config should now
look like this: look like this:
``` ```language-yaml
global: global:
scrape_interval: 15s # By default, scrape targets every 15 seconds. scrape_interval: 15s # By default, scrape targets every 15 seconds.
evaluation_interval: 15s # Evaluate rules every 15 seconds. evaluation_interval: 15s # Evaluate rules every 15 seconds.
......
...@@ -20,8 +20,7 @@ notifications to email, Pagerduty, Slack etc. ...@@ -20,8 +20,7 @@ notifications to email, Pagerduty, Slack etc.
### Bridge ### Bridge
A bridge is a component that takes samples from a client library and A bridge is a component that takes samples from a client library and
exposes them to a non-Prometheus monitoring system. For example the Python exposes them to a non-Prometheus monitoring system. For example, the Python, Go, and Java clients can export metrics to Graphite.
client can export metrics to Graphite.
### Client library ### Client library
...@@ -32,28 +31,37 @@ pull metrics from other systems and expose the metrics to Prometheus. ...@@ -32,28 +31,37 @@ pull metrics from other systems and expose the metrics to Prometheus.
### Collector ### Collector
A collector is a part of an exporter that represents a set of metrics. It may be A collector is a part of an exporter that represents a set of metrics. It may be
a single metric as part of direct instrumentation, or many metrics if it is pulling a single metric if it is part of direct instrumentation, or many metrics if it is pulling metrics from another system.
metrics from another system.
### Direct instrumentation ### Direct instrumentation
Direct instrumentation is when instrumentation is added inline as part the source code Direct instrumentation is instrumentation added inline as part the source code
of a program. of a program.
### Endpoint
A source of metrics than can be scraped, usually corresponding to a single process.
### Exporter ### Exporter
An exporter is a binary that exposes Prometheus metrics, commonly by converting An exporter is a binary that exposes Prometheus metrics, commonly by converting
metrics that are exposed in a non-Prometheus format into a format Prometheus supports. metrics that are exposed in a non-Prometheus format into a format Prometheus supports.
### Instance
An instance is a label that uniquely identifies a target in a job.
### Job
A collection of targets with the same purpose, for example monitoring a group of like processes replicated for scalability or reliability, is called a job.
### Notification ### Notification
A notification represents a group or one of more alerts, and is sent by the Alertmanager A notification represents a group of one of more alerts, and is sent by the Alertmanager to email, Pagerduty, Slack etc.
to email, Pagerduty, Slack etc.
### Promdash ### Promdash
Promdash was a native dashboard builder for Prometheus. It has been replaced by Promdash was a native dashboard builder for Prometheus. It has been deprecated and replaced by [Grafana](../../visualization/grafana/).
[Grafana](../../visualization/grafana/).
### Prometheus ### Prometheus
...@@ -102,9 +110,9 @@ A remote write endpoint is what Prometheus talks to when doing a remote write. ...@@ -102,9 +110,9 @@ A remote write endpoint is what Prometheus talks to when doing a remote write.
### Silence ### Silence
A silence in the Alertmanager prevents alerts with labels matching the silence from A silence in the Alertmanager prevents alerts, with labels matching the silence, from
being included in notifications. being included in notifications.
### Target ### Target
One application, server or endpoint that Prometheus is scraping. A target is the definition of an object to scrape. For example, what labels to apply, any authentication required to connect, or other information that defines how the scrape will occur.
...@@ -40,7 +40,7 @@ two examples. ...@@ -40,7 +40,7 @@ two examples.
### Volumes & bind-mount ### Volumes & bind-mount
Bind-mount your prometheus.yml from the host by running: Bind-mount your `prometheus.yml` from the host by running:
``` ```
docker run -p 9090:9090 -v /tmp/prometheus.yml:/etc/prometheus/prometheus.yml \ docker run -p 9090:9090 -v /tmp/prometheus.yml:/etc/prometheus/prometheus.yml \
...@@ -62,7 +62,7 @@ configuration itself is rather static and the same across all ...@@ -62,7 +62,7 @@ configuration itself is rather static and the same across all
environments. environments.
For this, create a new directory with a Prometheus configuration and a For this, create a new directory with a Prometheus configuration and a
Dockerfile like this: `Dockerfile` like this:
``` ```
FROM prom/prometheus FROM prom/prometheus
...@@ -76,7 +76,7 @@ docker build -t my-prometheus . ...@@ -76,7 +76,7 @@ docker build -t my-prometheus .
docker run -p 9090:9090 my-prometheus docker run -p 9090:9090 my-prometheus
``` ```
A more advanced option is to render the config dynamically on start A more advanced option is to render the configuration dynamically on start
with some tooling or even have a daemon update it periodically. with some tooling or even have a daemon update it periodically.
## Using configuration management systems ## Using configuration management systems
...@@ -84,19 +84,19 @@ with some tooling or even have a daemon update it periodically. ...@@ -84,19 +84,19 @@ with some tooling or even have a daemon update it periodically.
If you prefer using configuration management systems you might be interested in If you prefer using configuration management systems you might be interested in
the following third-party contributions: the following third-party contributions:
Ansible: ### Ansible
* [griggheo/ansible-prometheus](https://github.com/griggheo/ansible-prometheus) * [griggheo/ansible-prometheus](https://github.com/griggheo/ansible-prometheus)
* [William-Yeh/ansible-prometheus](https://github.com/William-Yeh/ansible-prometheus) * [William-Yeh/ansible-prometheus](https://github.com/William-Yeh/ansible-prometheus)
Chef: ### Chef
* [rayrod2030/chef-prometheus](https://github.com/rayrod2030/chef-prometheus) * [rayrod2030/chef-prometheus](https://github.com/rayrod2030/chef-prometheus)
Puppet: ### Puppet
* [puppet/prometheus](https://forge.puppet.com/puppet/prometheus) * [puppet/prometheus](https://forge.puppet.com/puppet/prometheus)
SaltStack: ### SaltStack
* [bechtoldt/saltstack-prometheus-formula](https://github.com/bechtoldt/saltstack-prometheus-formula) * [bechtoldt/saltstack-prometheus-formula](https://github.com/bechtoldt/saltstack-prometheus-formula)
...@@ -12,19 +12,19 @@ monitoring and alerting toolkit originally built at ...@@ -12,19 +12,19 @@ monitoring and alerting toolkit originally built at
[SoundCloud](http://soundcloud.com). Since its inception in 2012, many [SoundCloud](http://soundcloud.com). Since its inception in 2012, many
companies and organizations have adopted Prometheus, and the project has a very companies and organizations have adopted Prometheus, and the project has a very
active developer and user [community](/community). It is now a standalone open source project active developer and user [community](/community). It is now a standalone open source project
and maintained independently of any company. To emphasize this and clarify and maintained independently of any company. To emphasize this, and to clarify
the project's governance structure, Prometheus joined the the project's governance structure, Prometheus joined the
[Cloud Native Computing Foundation](https://cncf.io/) in 2016 [Cloud Native Computing Foundation](https://cncf.io/) in 2016
as the second hosted project after [Kubernetes](http://kubernetes.io/). as the second hosted project, after [Kubernetes](http://kubernetes.io/).
For a more elaborate overview, see the resources linked from the For more elaborate overviews of Prometheus, see the resources linked from the
[media](/docs/introduction/media/) section. [media](/docs/introduction/media/) section.
### Features ### Features
Prometheus's main features are: Prometheus's main features are:
* a multi-dimensional [data model](/docs/concepts/data_model/) (time series identified by metric name and key/value pairs) * a multi-dimensional [data model](/docs/concepts/data_model/) with time series data identified by metric name and key/value pairs
* a [flexible query language](/docs/querying/basics/) * a [flexible query language](/docs/querying/basics/)
to leverage this dimensionality to leverage this dimensionality
* no reliance on distributed storage; single server nodes are autonomous * no reliance on distributed storage; single server nodes are autonomous
...@@ -41,8 +41,8 @@ optional: ...@@ -41,8 +41,8 @@ optional:
* the main [Prometheus server](https://github.com/prometheus/prometheus) which scrapes and stores time series data * the main [Prometheus server](https://github.com/prometheus/prometheus) which scrapes and stores time series data
* [client libraries](/docs/instrumenting/clientlibs/) for instrumenting application code * [client libraries](/docs/instrumenting/clientlibs/) for instrumenting application code
* a [push gateway](https://github.com/prometheus/pushgateway) for supporting short-lived jobs * a [push gateway](https://github.com/prometheus/pushgateway) for supporting short-lived jobs
* special-purpose [exporters](/docs/instrumenting/exporters/) (for HAProxy, StatsD, Graphite, etc.) * special-purpose [exporters](/docs/instrumenting/exporters/) for services like HAProxy, StatsD, Graphite, etc.
* an [alertmanager](https://github.com/prometheus/alertmanager) * an [alertmanager](https://github.com/prometheus/alertmanager) to handle alerts
* various support tools * various support tools
Most Prometheus components are written in [Go](https://golang.org/), making Most Prometheus components are written in [Go](https://golang.org/), making
...@@ -50,16 +50,14 @@ them easy to build and deploy as static binaries. ...@@ -50,16 +50,14 @@ them easy to build and deploy as static binaries.
### Architecture ### Architecture
This diagram illustrates the overall architecture of Prometheus and some of This diagram illustrates the architecture of Prometheus and some of
its ecosystem components: its ecosystem components:
![Prometheus architecture](/assets/architecture.svg) ![Prometheus architecture](/assets/architecture.svg)
Prometheus scrapes metrics from instrumented jobs, either directly or via an Prometheus scrapes metrics from instrumented jobs, either directly or via an
intermediary push gateway for short-lived jobs. It stores all scraped samples intermediary push gateway for short-lived jobs. It stores all scraped samples
locally and runs rules over this data to either record new time series from locally and runs rules over this data to either aggregate and record new time series from existing data or generate alerts. [Grafana](https://grafana.com/) or other API consumers can be used to visualize the collected data.
existing data or generate alerts. Grafana or other API consumers can be used
to visualize the collected data.
## When does it fit? ## When does it fit?
...@@ -72,7 +70,7 @@ Prometheus is designed for reliability, to be the system you go to ...@@ -72,7 +70,7 @@ Prometheus is designed for reliability, to be the system you go to
during an outage to allow you to quickly diagnose problems. Each Prometheus during an outage to allow you to quickly diagnose problems. Each Prometheus
server is standalone, not depending on network storage or other remote services. server is standalone, not depending on network storage or other remote services.
You can rely on it when other parts of your infrastructure are broken, and You can rely on it when other parts of your infrastructure are broken, and
you do not have to set up complex infrastructure to use it. you do not need to setup extensive infrastructure to use it.
## When does it not fit? ## When does it not fit?
...@@ -80,5 +78,5 @@ Prometheus values reliability. You can always view what statistics are ...@@ -80,5 +78,5 @@ Prometheus values reliability. You can always view what statistics are
available about your system, even under failure conditions. If you need 100% available about your system, even under failure conditions. If you need 100%
accuracy, such as for per-request billing, Prometheus is not a good choice as accuracy, such as for per-request billing, Prometheus is not a good choice as
the collected data will likely not be detailed and complete enough. In such a the collected data will likely not be detailed and complete enough. In such a
case you would be best off using some other system to collect and analyse the case you would be best off using some other system to collect and analyze the
data for billing, and Prometheus for the rest of your monitoring. data for billing, and Prometheus for the rest of your monitoring.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment