Reorganize and improve documentation section.

e1cbc218 · Julius Volz · 831d3761 · e1cbc218 · e1cbc218 · e1cbc218
Commit e1cbc218 authored Dec 22, 2014 by Julius Volz
26 changed files
--- a/content/community.html
+++ b/content/community.html
-<div class="row">
-  <div class="col-md-4">
-    TODO: Add some text about the community here.
-  </div>
+<div class="col-md-8 col-md-offset-2 doc-content">
+  <h1>Community</h1>
+  <p>
+    Prometheus is developed in the open and has a growing community outside of
+    SoundCloud. Here are some of the channels we use to communicate and
+    contribute:
+  </p>
+  <p>
+    <strong>Mailing list:</strong>
+    <a href="https://groups.google.com/forum/#!forum/prometheus-developers">prometheus-developers</a> Google Group
+  </p>
+  <p>
+    <strong>IRC:</strong> <code>#prometheus</code> on <a href="http://freenode.net/">irc.freenode.net</a>
+  </p>
+  <p>
+      <strong>Issue tracker:</strong> We use the GitHub issue tracker for the various <a href="http://github.com/prometheus">Prometheus repositories</a>
+  </p>
+  <h1>Contributing</h1>
+  <p>
+    We welcome community contributions! Please see the
+    <code>CONTRIBUTING.md</code> file in the respective Prometheus repository
+    for instructions on how to submit changes. If you are planning on making
+    more elaborate or controversial changes, please discuss them on the mailing
+    list before sending a pull request.
+  </p>
+  <h1>Sponsorship</h1>
+  <p>
+    Prometheus was initially started privately by
+    <a href="https://github.com/matttproud">Matt Proud</a> and
+    <a href="https://github.com/juliusv">Julius Volz</a>, the majority of its
+    development has been sponsored by <a href="https://soundcloud.com">SoundCloud</a>.
+  </p>
 </div>
--- a/content/docs/concepts/automatic.md
+++ b/content/docs/concepts/automatic.md
 ---
 title: Automatic labels and synthetic metrics
-sort_rank: 2
+sort_rank: 3
 ---

-# Automatic labels and synthetic metrics
+# Automatic labels and metrics

 ## Automatically attached labels

 When Prometheus scrapes a target, it attaches some labels automatically to the
 scraped metrics timeseries which serve to identify the scraped target:

-* `job`: The Prometheus job name from which the timeseries was scraped.
-* `instance`: The specific instance/endpoint of the job which was scraped.
+* `job`: The configured Prometheus job name for which the target was scraped.
+* `instance`: The specific URL of the instance's endpoint that was scraped.

 If either of these labels are already present in the scraped data, they are not
 replaced. Instead, Prometheus adds its own labels with `exporter_` prepended to

--- a/content/docs/concepts/index.md
+++ b/content/docs/concepts/index.md
 ---
 title: Concepts
-sort_rank: 4
+sort_rank: 2
 nav_icon: flask
 ---
--- a/content/docs/concepts/metric_types.md
+++ b/content/docs/concepts/metric_types.md
 ---
 title: Metric types
-sort_rank: 1
+sort_rank: 2
 ---

 # Metric Types
@@ -16,8 +16,8 @@ right one for the right job.

 Metric types are currently only differentiated in the client libraries (to
 enable APIs tailored to the usage of the specific types) and in the wire
-protocol. The Prometheus server does not yet persist and make use of the type
-information after ingesting samples. This may change in the future, however.
+protocol. The Prometheus server does not yet make use of the type information
+after ingesting samples as timeseries. This may change in the future.

 ## Counter


--- a/content/docs/concepts/timeseries.md
+++ b/content/docs/concepts/timeseries.md
+---
+title: Timeseries
+sort_rank: 1
+---
+
+# Timeseries
+
+TODO: explain how timeseries are identified and stored.
--- a/content/docs/using/instrumenting.md
+++ b/content/docs/using/instrumenting.md
 ---
-title: Instrumenting your code
-sort_rank: 2
+title: Start
+sort_rank: 1
 ---

 # Instrumenting your code
@@ -10,7 +10,7 @@ instrumentation, you will need to instrument your application's code via one of
 the Prometheus client libraries.

 First, familiarize yourself with the Prometheus-supported
-[metrics types](/concepts/metric_types/). To use these types programmatically, see
+[metric types](/docs/concepts/metric_types/). To use these types programmatically, see
 your specific client library's documentation.

 Choose a Prometheus client library that matches the language in which your

--- a/content/docs/instrumenting/index.md
+++ b/content/docs/instrumenting/index.md
+---
+title: Instrumenting
+sort_rank: 4
+nav_icon: code
+---
+
--- a/content/docs/using/codelab.md
+++ b/content/docs/using/codelab.md
 ---
 title: Starter codelab
-sort_rank: 1
+sort_rank: 3
 ---

-# Intro Codelab
+# Intro codelab

 This guide is a "Hello World"-style codelab which shows how to install,
 configure, and use Prometheus in a simple example setup. You'll build and run
@@ -29,7 +29,7 @@ cd prometheus
 make build
 ```

-## Configuring Prometheus to Monitor Itself
+## Configuring Prometheus to monitor itself

 Prometheus collects metrics from monitored targets by scraping metrics HTTP
 endpoints on these targets. Since Prometheus also exposes data in the same
@@ -69,11 +69,10 @@ job: {
 }
 ```

-As you might have noticed, Prometheus configuration is supplied in an ASCII
-form of
-[protocol buffers](https://developers.google.com/protocol-buffers/docs/overview).
-The protocol buffer schema definition has a [complete documentation of all
-available configuration options](https://github.com/prometheus/prometheus/blob/master/config/config.proto).
+Prometheus configuration is supplied in an ASCII form of [protocol
+buffers](https://developers.google.com/protocol-buffers/docs/overview). The
+[schema definition](https://github.com/prometheus/prometheus/blob/master/config/config.proto)
+has a complete documentation of all available configuration options.

 ## Starting Prometheus

@@ -91,62 +90,60 @@ http://localhost:9090. Give it a couple of seconds to start collecting data
 about itself from its own HTTP metrics endpoint.

 You can also verify that Prometheus is serving metrics about itself by
-navigating to its metrics exposure endpoint: [[http://localhost:9090/metrics]]
+navigating to its metrics exposure endpoint: http://localhost:9090/metrics

-## Using the Expression Browser
+## Using the expression browser

 Let's try looking at some data that Prometheus has collected about itself. To
-use Prometheus' built-in expression browser, navigate to
-[[http://localhost:9090/]] and choose the "Tabular" from the "Graph & Console"
+use Prometheus's built-in expression browser, navigate to
+http://localhost:9090/ and choose the "Tabular" view within the "Graph"
 tab.

-As you can gather from [[http://localhost:9090/metrics]], one metric that
+As you can gather from http://localhost:9090/metrics, one metric that
 Prometheus exports about itself is called
-`prometheus_metric_disk_latency_microseconds`. Go ahead and enter this into the
+`prometheus_target_operation_latency_milliseconds`. Go ahead and enter this into the
 expression console:

 ```
-prometheus_metric_disk_latency_microseconds
+prometheus_target_operation_latency_milliseconds
 ```

 This should return a lot of different timeseries (along with the latest value
 recorded for each), all with the metric name
-`prometheus_metric_disk_latency_microseconds`, but with different labels. These
-labels designate different latency percentiles, operation types, and operation
-results (success, failure).
+`prometheus_target_operation_latency_milliseconds`, but with different labels. These
+labels designate different latency percentiles and operation outcomes.

 To count the number of returned timeseries, you could write:

 ```
-count(prometheus_metric_disk_latency_microseconds)
+count(prometheus_target_operation_latency_milliseconds)
 ```

-If we were only interested in the 99th percentile latencies for e.g.
-`get_value_at_time` operations, we could use this query to retrieve that
-information:
+If we were only interested in the 99th percentile latencies for scraping
+Prometheus itself, we could use this query to retrieve that information:

 ```
-prometheus_metric_disk_latency_microseconds{operation="get_value_at_time", percentile="0.990000"}
+prometheus_target_operation_latency_milliseconds{instance="http://localhost:9090/metrics", quantile="0.99"}
 ```

-For further details about the expression language, see the [[Expression Language]]
-documentation.
+For further details about the expression language, see the
+[expression language documentation](/docs/querying/basics).

-## Using the Graphing Interface
+## Using the graphing interface

-To graph expressions, navigate to [[http://localhost:9090/]] and use the
-"Graph" tab.
+To graph expressions, navigate to http://localhost:9090/ and use the "Graph"
+tab.

 For example, enter the following expression to graph all latency percentiles
-for `get_value_at_time` operations in Prometheus:
+for scraping Prometheus itself operations:

 ```
-prometheus_metric_disk_latency_microseconds{operation="get_value_at_time"}
+prometheus_target_operation_latency_milliseconds{instance="http://localhost:9090/metrics"}
 ```

 Experiment with the graph range parameters and other settings.

-## Starting Up Some Sample Targets
+## Starting up some sample targets

 Let's make this more interesting and start some example targets for Prometheus
 to scrape.
@@ -168,14 +165,15 @@ go run main.go -listeningAddress=:8081
 go run main.go -listeningAddress=:8082
 ```

-You should now have example targets listening on
-[[http://localhost:8080/metrics]], [[http://localhost:8081/metrics]], and
-[[http://localhost:8082/metrics]].
+You should now have example targets listening on http://localhost:8080/metrics,
+http://localhost:8081/metrics, and http://localhost:8082/metrics.
+
+TODO: These examples don't exist anymore. Provide alternatives.

-## Configuring Prometheus to Monitor the Sample Targets
+## Configuring Prometheus to monitor the sample targets

-Now we'll configure Prometheus to scrape these new targets. Let's group these
-three endpoints into a job we call `random-example`. However, imagine that the
+Now we'll configure Prometheus to scrape these new targets. Let's group all
+three endpoints into one job called `random-example`. However, imagine that the
 first two endpoints are production targets, while the third one represents a
 canary instance. To model this in Prometheus, we can add several groups of
 endpoints to a single job, adding extra labels to each group of targets. In
@@ -217,11 +215,11 @@ Go to the expression browser and verify that Prometheus now has information
 about timeseries that these example endpoints expose, e.g. the
 `rpc_calls_total` metric.

-## Configure Rules For Aggregating Scraped Data into New Timeseries
+## Configure rules for aggregating scraped data into new timeseries

-Manually entering expressions every you time you need them can get cumbersome
-and might also be slow to compute in some cases. Prometheus allows you to
-periodically record expressions into completely new timeseries via configured
+Queries that aggregate over thousands of timeseries can get slow when computed
+ad-hoc. To make this more efficient, Prometheus allows you to prerecord
+expressions into completely new persisted timeseries via configured recording
 rules. Let's say we're interested in recording the per-second rate of
 `rpc_calls_total` averaged over all instances as measured over the last 5
 minutes. We could write this as:

--- a/content/docs/introduction/install.md
+++ b/content/docs/introduction/install.md
 ---
-title: Download and install
+title: Installing
 sort_rank: 2
 ---

-# Download and Install Prometheus
+# Installing

-## Downloading
+## Using pre-compiled binaries

-## Installing
+We plan on providing precompiled binaries for various platforms and even
+packages for common Linux distributions soon. Once those are offered, it
+will be the recommended way of installing Prometheus.
+
+## From source
+
+For building Prometheus from source, see the relevant [`README.md` section](https://github.com/prometheus/prometheus/blob/master/README.md#use-make).
+
+## Using Docker
+
+TODO: Add docker instructions.
--- a/content/docs/introduction/overview.md
+++ b/content/docs/introduction/overview.md
@@ -3,16 +3,18 @@ title: Overview
 sort_rank: 1
 ---

+# Overview
+
 ## What is Prometheus?

 [Prometheus](https://github.com/prometheus) is an open-source systems
 monitoring and alerting toolkit built at [SoundCloud](http://soundcloud.com).
 Since its inception in 2012, it has become the standard for instrumenting new
-services at SoundCloud. Prometheus' main distinguishing features as compared to
-other monitoring systems are:
+services at SoundCloud and has seen growing external usage and contributions.
+Prometheus's main distinguishing features are:

- a **multi-dimensional** data model (via key/value pairs attached to timeseries)
- a [**flexible query language**](http://localhost:3000/using/querying/basics/)
+- a **multi-dimensional** data model (timeseries identified by metric name and key/value pairs)
+- a [**flexible query language**](/docs/using/querying/basics/)
  to leverage this dimensionality
 - no reliance on distributed storage; **single server nodes are autonomous**
 - timeseries collection happens via a **pull model** over HTTP

--- a/content/docs/operating/index.md
+++ b/content/docs/operating/index.md
 ---
 title: Operating
-sort_rank: 3
+sort_rank: 5
 nav_icon: cog
 ---
--- a/content/docs/using/rules.md
+++ b/content/docs/using/rules.md
--- a/content/docs/practices/index.md
+++ b/content/docs/practices/index.md
 ---
 title: Best practices
-sort_rank: 5
+sort_rank: 6
 nav_icon: thumbs-o-up
 ---
--- a/content/docs/practices/naming.md
+++ b/content/docs/practices/naming.md
+---
+title: Metric and label naming
+sort_rank: 1
+---
+
+# Metric and label naming
+
+The metric and label conventions presented in this document are not required
+for using Prometheus, but can serve as both a style-guide and collection of
+best practices. Individual organizations might want to approach e.g. naming
+conventions differently.
+
+## Metric Names
+
+A metric name:
+
+* should have a (single-word) application prefix relevant to the containing Prometheus domain
+ * `prometheus_notifications_total`
+ * `indexer_requests_latencies_milliseconds`
+ * `processor_requests_total`
+* must have a single unit (i.e. don't mix seconds with milliseconds)
+* should have a units suffix
+ * `api_http_request_latency_milliseconds`
+ * `node_memory_usage_bytes`
+ * `api_http_requests_total` (for an accumulating count)
+* should represent the same logical thing-being-measured
+ * request duration
+ * bytes of data transfer
+ * instantaneous resource usage as a percentage
+
+As a rule of thumb, if you `sum()` or `avg()` over all dimensions of a given
+metric, the result should be meaningful (though not necessarily useful). If it
+isn't meaningful, split the data up into multiple metrics. For example, having
+the capacity of various queues in the metric is good, mixing the capacity of a
+queue with the number of elements is not.
+
+## Labels
+
+Use labels to differentiate
+
+* class of thing-being-measured
+ * `api_http_requests_total` - differentiate request types: `type={create,update,delete}`
+ * `api_request_duration_nanoseconds` - differentiate request stages: `stage={extract,transform,load}`
+
+Remember that every unique (label, value) pair represents a new axis of
+cardinality for the associated metric, which can dramatically increase the
+amount of data stored.
+
+
--- a/content/docs/using/pushing.md
+++ b/content/docs/using/pushing.md
 ---
 title: Pushing data
-sort_rank: 6
+sort_rank: 2
 ---

 # Pushing Data

--- a/content/docs/using/querying/basics.md
+++ b/content/docs/using/querying/basics.md
 ---
-title: The basics
+title: Basics
 sort_rank: 1
 ---

@@ -15,7 +15,7 @@ consumed and further processed by external systems via the HTTP API.
 ## Examples

 This document is meant as a reference. For learning, it might be easier to
-start with a couple of examples. See the [Expression Language Examples](/using/querying/examples).
+start with a couple of [examples](/docs/using/querying/examples).

 ## Basic Concepts


--- a/content/docs/using/querying/examples.md
+++ b/content/docs/using/querying/examples.md
--- a/content/docs/using/querying/functions.md
+++ b/content/docs/using/querying/functions.md
--- a/content/docs/using/querying/index.md
+++ b/content/docs/using/querying/index.md
 ---
 title: Query language
-sort_rank: 3
+sort_rank: 2
+nav_icon: search
 ---
--- a/content/docs/using/querying/operators.md
+++ b/content/docs/using/querying/operators.md
--- a/content/docs/using/index.md
+++ b/content/docs/using/index.md
---
-title: Using
-sort_rank: 2
-nav_icon: line-chart
---
--- a/content/docs/visualization/browser.md
+++ b/content/docs/visualization/browser.md
+---
+title: Expression browser
+sort_rank: 1
+---
+
+# Expression browser
+
+TODO: Add content.
--- a/content/docs/visualization/consoles.md
+++ b/content/docs/visualization/consoles.md
+---
+title: Console templates
+sort_rank: 3
+---
+
+# Console templates
+
+TODO: Add content.
--- a/content/docs/using/graphing/index.md
+++ b/content/docs/using/graphing/index.md
 ---
-title: Graphing and dashboards
+title: Visualization
 sort_rank: 3
+nav_icon: search
 ---
+
--- a/content/docs/visualization/promdash.md
+++ b/content/docs/visualization/promdash.md
+---
+title: PromDash
+sort_rank: 2
+---
+
+# Console templates
+
+TODO: Add content.
--- a/content/index.html
+++ b/content/index.html
@@ -22,7 +22,7 @@ title: Home
  <div class="row">
    <div class="col-md-4">
      <h2><i class="fa fa-cloud-download"></i> Pull Model</h2>
-      <p class="desc">Prometheus collects timeseries data by scraping instrumented services via HTTP. Short-lived jobs are supported via a push gateway.</p>
+      <p class="desc">Prometheus collects timeseries data by scraping instrumented services. This allows Prometheus to detect when targets are down.</p>
      <p><a class="btn btn-default" href="#" role="button">View details &raquo;</a></p>
    </div>
    <div class="col-md-4">
@@ -32,7 +32,7 @@ title: Home
    </div>
    <div class="col-md-4">
      <h2><i class="fa fa-cog"></i> Operation</h2>
-      <p class="desc">Prometheus servers are autonomous, with no dependency on distributed storage. Written in Go, all binaries are statically linked and easy to deploy.</p>
+      <p class="desc">Each server is independent for reliability, relying only on local storage. Written in Go, all binaries are statically linked and easy to deploy.</p>
      <p><a class="btn btn-default" href="#" role="button">View details &raquo;</a></p>
    </div>
  </div>