Commit db530b9b authored by Julius Volz's avatar Julius Volz

Bring getting-started tutorial up-to-date.

parent fc7df310
...@@ -6,10 +6,10 @@ sort_rank: 3 ...@@ -6,10 +6,10 @@ sort_rank: 3
# Getting started # Getting started
This guide is a "Hello World"-style tutorial which shows how to install, This guide is a "Hello World"-style tutorial which shows how to install,
configure, and use Prometheus in a simple example setup. You'll build and run configure, and use Prometheus in a simple example setup. You will build and run
Prometheus locally, configure it to scrape itself and an example application, Prometheus locally, configure it to scrape itself and an example application,
and then work with queries, rules, and graphs to make use of the collected and then work with queries, rules, and graphs to make use of the collected time
time series data. series data.
## Getting Prometheus ## Getting Prometheus
...@@ -36,7 +36,7 @@ endpoints on these targets. Since Prometheus also exposes data in the same ...@@ -36,7 +36,7 @@ endpoints on these targets. Since Prometheus also exposes data in the same
manner about itself, it may also be used to scrape and monitor its own health. manner about itself, it may also be used to scrape and monitor its own health.
While a Prometheus server which collects only data about itself is not very While a Prometheus server which collects only data about itself is not very
useful in practice, it's a good starting example. Save the following basic useful in practice, it is a good starting example. Save the following basic
Prometheus configuration as a file named `prometheus.conf`: Prometheus configuration as a file named `prometheus.conf`:
``` ```
...@@ -61,7 +61,8 @@ job: { ...@@ -61,7 +61,8 @@ job: {
# Override the global default and scrape targets from this job every 5 seconds. # Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: "5s" scrape_interval: "5s"
# Let's define a group of targets to scrape for this job. In this case, only one. # Let's define a group of static targets to scrape for this job. In this
# case, only one.
target_group: { target_group: {
# These endpoints are scraped via HTTP. # These endpoints are scraped via HTTP.
target: "http://localhost:9090/metrics" target: "http://localhost:9090/metrics"
...@@ -96,34 +97,34 @@ navigating to its metrics exposure endpoint: http://localhost:9090/metrics ...@@ -96,34 +97,34 @@ navigating to its metrics exposure endpoint: http://localhost:9090/metrics
Let's try looking at some data that Prometheus has collected about itself. To Let's try looking at some data that Prometheus has collected about itself. To
use Prometheus's built-in expression browser, navigate to use Prometheus's built-in expression browser, navigate to
http://localhost:9090/ and choose the "Tabular" view within the "Graph" http://localhost:9090/graph and choose the "Tabular" view within the "Graph"
tab. tab.
As you can gather from http://localhost:9090/metrics, one metric that As you can gather from http://localhost:9090/metrics, one metric that
Prometheus exports about itself is called Prometheus exports about itself is called
`prometheus_target_operation_latency_milliseconds`. Go ahead and enter this into the `prometheus_target_interval_length_seconds` (the actual amount of time between
expression console: target scrapes). Go ahead and enter this into the expression console:
``` ```
prometheus_target_operation_latency_milliseconds prometheus_target_interval_length_seconds
``` ```
This should return a lot of different time series (along with the latest value This should return a lot of different time series (along with the latest value
recorded for each), all with the metric name recorded for each), all with the metric name
`prometheus_target_operation_latency_milliseconds`, but with different labels. These `prometheus_target_interval_length_seconds`, but with different labels. These
labels designate different latency percentiles and operation outcomes. labels designate different latency percentiles and target group intervals.
To count the number of returned time series, you could write: If we were only interested in the 99th percentile latencies, we could use this
query to retrieve that information:
``` ```
count(prometheus_target_operation_latency_milliseconds) prometheus_target_interval_length_seconds{quantile="0.99"}
``` ```
If we were only interested in the 99th percentile latencies for scraping To count the number of returned time series, you could write:
Prometheus itself, we could use this query to retrieve that information:
``` ```
prometheus_target_operation_latency_milliseconds{instance="http://localhost:9090/metrics", quantile="0.99"} count(prometheus_target_interval_length_seconds)
``` ```
For further details about the expression language, see the For further details about the expression language, see the
...@@ -131,14 +132,14 @@ For further details about the expression language, see the ...@@ -131,14 +132,14 @@ For further details about the expression language, see the
## Using the graphing interface ## Using the graphing interface
To graph expressions, navigate to http://localhost:9090/ and use the "Graph" To graph expressions, navigate to http://localhost:9090/graph and use the "Graph"
tab. tab.
For example, enter the following expression to graph all latency percentiles For example, enter the following expression to graph the per-second rate of all
for scraping Prometheus itself operations: storage chunk operations happening in the self-scraped Prometheus:
``` ```
prometheus_target_operation_latency_milliseconds{instance="http://localhost:9090/metrics"} rate(prometheus_local_storage_chunk_ops_total[1m])
``` ```
Experiment with the graph range parameters and other settings. Experiment with the graph range parameters and other settings.
...@@ -148,28 +149,31 @@ Experiment with the graph range parameters and other settings. ...@@ -148,28 +149,31 @@ Experiment with the graph range parameters and other settings.
Let's make this more interesting and start some example targets for Prometheus Let's make this more interesting and start some example targets for Prometheus
to scrape. to scrape.
Download the Go client library for Prometheus, and run some random examples The Go client library includes an example which exports fictional RPC latencies
from it that export time series with random data: for three services with different latency distributions.
Download the Go client library for Prometheus and run three of these example
processes:
```bash ```bash
# Fetch the client library code: # Fetch the client library code.
git clone git@github.com:/prometheus/client_golang git clone git@github.com:/prometheus/client_golang
# You might also want to do this if you didn't download the above repo into your Go package path already: # Change to the random RPC example.
go get github.com/prometheus/client_golang cd client_golang/examples/random
# Assuming a working Go setup, fetch necessary dependencies.
go get -d
# Start 3 example targets in screen sessions: # Start 3 example targets in screen sessions:
cd client_golang/examples/random go run main.go -listen-address=:8080
go run main.go -listeningAddress=:8080 go run main.go -listen-address=:8081
go run main.go -listeningAddress=:8081 go run main.go -listen-address=:8082
go run main.go -listeningAddress=:8082
``` ```
You should now have example targets listening on http://localhost:8080/metrics, You should now have example targets listening on http://localhost:8080/metrics,
http://localhost:8081/metrics, and http://localhost:8082/metrics. http://localhost:8081/metrics, and http://localhost:8082/metrics.
TODO: These examples don't exist anymore. Provide alternatives.
## Configuring Prometheus to monitor the sample targets ## Configuring Prometheus to monitor the sample targets
Now we'll configure Prometheus to scrape these new targets. Let's group all Now we'll configure Prometheus to scrape these new targets. Let's group all
...@@ -186,6 +190,7 @@ restart your Prometheus instance: ...@@ -186,6 +190,7 @@ restart your Prometheus instance:
``` ```
job: { job: {
name: "random-example" name: "random-example"
scrape_interval: "5s"
# The "production" targets for this job. # The "production" targets for this job.
target_group: { target_group: {
...@@ -212,27 +217,32 @@ job: { ...@@ -212,27 +217,32 @@ job: {
``` ```
Go to the expression browser and verify that Prometheus now has information Go to the expression browser and verify that Prometheus now has information
about time series that these example endpoints expose, e.g. the about time series that these example endpoints expose, such as the
`rpc_calls_total` metric. `rpc_durations_microseconds` metric.
## Configure rules for aggregating scraped data into new time series ## Configure rules for aggregating scraped data into new time series
Queries that aggregate over thousands of time series can get slow when computed Though not a problem in our example, queries that aggregate over thousands of
ad-hoc. To make this more efficient, Prometheus allows you to prerecord time series can get slow when computed ad-hoc. To make this more efficient,
expressions into completely new persisted time series via configured recording Prometheus allows you to prerecord expressions into completely new persisted
rules. Let's say we're interested in recording the per-second rate of time series via configured recording rules. Let's say we are interested in
`rpc_calls_total` averaged over all instances as measured over the last 5 recording the per-second rate of example RPCs
(`rpc_durations_microseconds_count`) averaged over all instances (but
preserving the `job` and `service` dimensions) as measured over a window of 5
minutes. We could write this as: minutes. We could write this as:
``` ```
avg(rate(rpc_calls_total[5m])) avg(rate(rpc_durations_microseconds_count[5m])) by (job, service)
``` ```
To record this expression as a new time series called `job:rpc_calls:avg_rate5m`, create a Try graphing this expression.
file with the following recording rule and save it as `prometheus.rules`:
To record the time series resulting from this expression into a new metric
called `job_service:rpc_durations_microseconds_count:avg_rate5m`, create a file
with the following recording rule and save it as `prometheus.rules`:
``` ```
job:rpc_calls:avg_rate5m = avg(rate(rpc_calls_total[5m])) job_service:rpc_durations_microseconds_count:avg_rate5m = avg(rate(rpc_durations_microseconds_count[5m])) by (job, service)
``` ```
To make Prometheus pick up this new rule, add a `rule_files` statement to the To make Prometheus pick up this new rule, add a `rule_files` statement to the
...@@ -255,8 +265,10 @@ global: { ...@@ -255,8 +265,10 @@ global: {
# Load and evaluate rules in this file every 'evaluation_interval' seconds. This field may be repeated. # Load and evaluate rules in this file every 'evaluation_interval' seconds. This field may be repeated.
rule_file: "prometheus.rules" rule_file: "prometheus.rules"
} }
[...]
``` ```
Restart Prometheus with the new configuration and verify that a new time series Restart Prometheus with the new configuration and verify that a new time series
with the metric name `job:rpc_calls:avg_rate5m` is now available by querying it with the metric name `job_service:rpc_durations_microseconds_count:avg_rate5m`
through the expression browser or graphing it. is now available by querying it through the expression browser or graphing it.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment