Merge pull request #4 from prometheus/doc-fixups

Followup fixups for PR #2.

Merge pull request #4 from prometheus/doc-fixups
Followup fixups for PR #2.
895fe7b2 · juliusv · 11fed144 · 475c876f · 895fe7b2 · 895fe7b2
Commit 895fe7b2 authored Jan 06, 2015 by juliusv
7 changed files
--- a/content/docs/introduction/overview.md
+++ b/content/docs/introduction/overview.md
@@ -29,7 +29,7 @@ optional:
 - client libraries for instrumenting application code
 - a [push gateway](https://github.com/prometheus/pushgateway) for supporting short-lived jobs
 - a [GUI-based dashboard builder](PromDash) based on Rails/SQL
- special-purpose exporters (for HAProxy, StatsD, Ganglia, etc.) 
+- special-purpose exporters (for HAProxy, StatsD, Ganglia, etc.)
 - an (experimental) [alert manager](https://github.com/prometheus/alertmanager)
 - a [command-line querying tool](https://github.com/prometheus/prometheus_cli)
 - various support tools
@@ -50,13 +50,13 @@ Prometheus is designed for reliability, to be the system you go to
 during an outage to allow you to quickly diagnose problems. Each Prometheus
 server is standalone, not depending on network storage or other remote services.
 You can rely it when other parts of your infrastructure are broken, and
-you don't have to setup complex infrastructure to use it.
+you don't have to set up complex infrastructure to use it.
 ## When doesn't it fit?
 Prometheus values reliability. You can always view what statistics are
 available about your system, even under failure conditions. If you need 100%
 accuracy, such as for per-request billing, Prometheus is not a good choice as
-we keep things simple and easy to understand. In such a case you would be best
+the collected data will likely not be detailed and complete enough. In such a
-using some other system to collect and analyse the data for billing, and
+case you would be best off using some other system to collect and analyse the
-Prometheus for the rest of your monitoring.
+data for billing, and Prometheus for the rest of your monitoring.
--- a/content/docs/practices/alerting.md
+++ b/content/docs/practices/alerting.md
@@ -5,7 +5,8 @@ sort_rank: 4
 # Alerting
-We recommend that you read [My Philosophy on Alerting](https://docs.google.com/a/boxever.com/document/d/199PqyG3UsyXlwieHaqbGiWVa8eMWi8zzAn0YfcApr8Q/edit) based on Rob Ewaschuk's observations at Google.
+We recommend that you read [My Philosophy on Alerting](https://docs.google.com/a/boxever.com/document/d/199PqyG3UsyXlwieHaqbGiWVa8eMWi8zzAn0YfcApr8Q/edit)
+based on Rob Ewaschuk's observations at Google.
 To summarize, keep alerting simple, alert on symptoms, have good consoles
 to allow pinpointing causes and avoid having pages where there is nothing to
@@ -15,24 +16,24 @@ do.
 Aim to have as few alerts as possible, by alerting on symptoms that are
 associated with end-user pain rather than trying to catch every possible way
-that pain could be caused. Alerts should link to relevant consoles,
+that pain could be caused. Alerts should link to relevant consoles
 and make it easy to figure out which component is at fault.
-Allow slack in alerting to accommodate small blips.
+Allow for slack in alerting to accommodate small blips.
 ### Online serving systems
 Typically alert on high latency and error rates as high up in the stack as possible.
-Only page on latency at one point in a stack, if a lower component is slower
+Only page on latency at one point in a stack. If a lower-level component is
-than it should be but the overall user latency is fine then there is no need to
+slower than it should be, but the overall user latency is fine, then there is
-page.
+no need to page.
-For error rates, page on errors to the user. If there are errors further down
+For error rates, page on user-visible errors. If there are errors further down
 the stack that will cause such a failure, there is no need to page on them
-separately. However if some failures do not cause a to the user-visible
+separately. However, if some failures are not user-visible, but are otherwise
-failure but are otherwise severe enough to require human involvment (for
+severe enough to require human involvment (for example, you're losing a lot of
-example, you're losing a lot of money), add pages to be sent on those.
+money), add pages to be sent on those.
 You may need alerts for different types of request if they have different
 characteristics, or problems in a low-traffic type of request would be drowned
@@ -40,7 +41,7 @@ out by high-traffic requests.
 ### Offline processing
-For offline processing systems the key metric is how long data takes to get
+For offline processing systems, the key metric is how long data takes to get
 through the system, so page if that gets high enough to cause user impact.
 ### Batch jobs
@@ -51,7 +52,7 @@ recently enough, and this will cause user-visible problems.
 This should generally be at least enough time for 2 full runs of the batch job.
 For a job that runs every 4 hours and takes an hour, 10 hours would be a
 reasonable threshold. If you cannot withstand a single run failing, run the
-job more often as a single failure should not require human intervention.
+job more frequently, as a single failure should not require human intervention.
 ### Capacity
@@ -61,14 +62,14 @@ often requires human intervention to avoid an outage in the near future.
 ### Metamonitoring
 It is important to have confidence that monitoring is working. Accordingly, have
-alerts to ensure Prometheus servers, Alertmanagers, PushGateways and
+alerts to ensure that Prometheus servers, Alertmanagers, PushGateways, and
 other monitoring infrastructure are available and running correctly.
-As always, if it is possible to alert on symptoms rather than causes,this helps
+As always, if it is possible to alert on symptoms rather than causes, this helps
 to reduce noise. For example, a blackbox test that alerts are getting from
 PushGateway to Prometheus to Alertmanager to email is better than individual
 alerts on each.
 Supplementing the whitebox monitoring of Prometheus with external blackbox
 monitoring can catch problems that are otherwise invisible, and also serves as
-a fallback in-case internal systems completely fail.
+a fallback in case internal systems completely fail.
--- a/content/docs/practices/consoles.md
+++ b/content/docs/practices/consoles.md
@@ -9,32 +9,31 @@ It can be tempting to display as much data as possible on a dashboard, especiall
 when a system like Prometheus offers the ability to have such rich
 instrumentation of your applications. This can lead to consoles that are
 impenetrable due to having too much information, that even an expert in the
-system would have difficulty drawing meaning from. Hundreds of graphs on a
+system would have difficulty drawing meaning from.
-single page isn't unheard of, nor is a hundred plots on a single graph
 Instead of trying to represent every piece of data you have, for operational
-consoles think of what are the most likely failure modes and how you'd use the
+consoles think of what are the most likely failure modes and how you would use the
 consoles to differentiate them. Take advantage of the structure of your
-services. For example if you've a big tree of services in an online serving
+services. For example, if you have a big tree of services in an online-serving
-system, latency in some lower service is a typical problem. You could have one
+system, latency in some lower service is a typical problem. Rather than showing
-big page with every service's information, a better approach is one page per
+every service's information on a single large dashboard, build separate dashboards
-service that includes the latency and errors it sees for each service it talks
+for each service that include the latency and errors for each service they talk
-to. You can then start at the top and work your way down to the problem
+to. You can then start at the top and work your way down to the problematic
 service.
-We've found the following guidelines very effective:
+We have found the following guidelines very effective:
 * Have no more than 5 graphs on a console.
 * Have no more than 5 plots (lines) on each graph. You can get away with more if it's a stacked/area graph.
-* If using console templates, try to avoid more than 20-30 entries on the table on the right
+* When using the provided console template examples, avoid more than 20-30 entries in the right-hand-side table.
-If you find yourself exceeding these then you should demote the visibility of
+If you find yourself exceeding these, it could make sense to demote the visibility of
 less important information, possibly splitting out some subsystems to a new console.
-For example you could graph aggregated rather than broken-down data, move
+For example, you could graph aggregated rather than broken-down data, move
-things to the right hand table or even remove it completely if it's rarely
+it to the right-hand-side table, or even remove data completely if it is rarely
 useful - you can always look at it in the [expression browser](../../visualization/browser/)!
 Finally, it is difficult for a set of consoles to serve more than one master.
 What you want to know when oncall (what's broken?) tends to be very different
 from what you want when developing features (how many people hit corner
-case X?). In such cases, two seperate sets of consoles can be useful.
+case X?). In such cases, two separate sets of consoles can be useful.
--- a/content/docs/practices/instrumentation.md
+++ b/content/docs/practices/instrumentation.md
--- a/content/docs/practices/naming.md
+++ b/content/docs/practices/naming.md
@@ -7,7 +7,7 @@ sort_rank: 1
 The metric and label conventions presented in this document are not required
 for using Prometheus, but can serve as both a style-guide and collection of
-best practices. Individual organizations might want to approach e.g. naming
+best practices. Individual organizations may want to approach e.g. naming
 conventions differently.
 ## Metric names
@@ -18,7 +18,7 @@ A metric name:
 * <code><b>prometheus</b>\_notifications\_total</code>
 * <code><b>indexer</b>\_requests\_latencies\_milliseconds</code>
 * <code><b>processor</b>\_requests\_total</code>
-* must have a single unit (i.e. don't mix seconds with milliseconds)
+* must have a single unit (i.e. do not mix seconds with milliseconds)
 * should have a units suffix
 * <code>api\_http\_request\_latency\_<b>milliseconds</b></code>
 * <code>node\_memory\_usage\_<b>bytes</b></code>
@@ -29,7 +29,7 @@ A metric name:
 * instantaneous resource usage as a percentage
 As a rule of thumb, either the `sum()` or the `avg()` over all dimensions of a
-given metric should be meaningful (though not necessarily useful). If it isn't
+given metric should be meaningful (though not necessarily useful). If it is not
 meaningful, split the data up into multiple metrics. For example, having the
 capacity of various queues in the metric is good, mixing the capacity of a
 queue with the current number of elements in the queue is not.
@@ -41,11 +41,11 @@ Use labels to differentiate the characteristics of the thing that is being measu
 * `api_http_requests_total` - differentiate request types: `type="create|update|delete"`
 * `api_request_duration_nanoseconds` - differentiate request stages: `stage="extract|transform|load"`
-Don't put the label names in the metric name, as that's redundant and
+Do not put the label names in the metric name, as this introduces redundancy
-will cause confusion if it's aggregated away.
+and will cause confusion if the respective labels are aggregated away.
 CAUTION: <b>CAUTION:</b> Remember that every unique key-value label pair
 represents a new time series, which can dramatically increase the amount of
-data stored. Don't use labels to store dimensions with high cardinality (many
+data stored. Do not use labels to store dimensions with high cardinality (many
 different label values), such as user IDs, email addresses, or other unbounded
 sets of values.
--- a/content/docs/visualization/browser.md
+++ b/content/docs/visualization/browser.md
@@ -5,7 +5,6 @@ sort_rank: 1
 # Expression browser
+The expression browser is available at `/graph` on the Prometheus server, allowing you to enter any expression and see its result either in a table or graphed over time.
-The expression browser is available at `/graph` on the Prometheus server, allowing you to enter any expression and see it's result either in a table or graphed over time.
+This is primarily useful for ad-hoc queries and debugging. For consoles, use [PromDash](../promdash/) or [Console templates](../consoles/).
-This is primarily useful for ad-hoc queries and debugging, for consoles you should use [PromDash](../promdash/) or [Console templates](../consoles/). 
--- a/content/docs/visualization/promdash.md
+++ b/content/docs/visualization/promdash.md
@@ -5,7 +5,7 @@ sort_rank: 2
 # PromDash
-PromDash is a simple, easy and quick way to create consoles from your browser.
+PromDash is a simple, easy, and quick way to create consoles from your browser.
 See the [documentation](https://github.com/prometheus/promdash/blob/master/README.md) for more information.