Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
D
docs
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Administrator
docs
Commits
b3e57b67
Commit
b3e57b67
authored
Dec 26, 2014
by
Julius Volz
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Start comparisons to alternative systems.
parent
17b600e1
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
137 additions
and
0 deletions
+137
-0
comparison.md
content/docs/introduction/comparison.md
+137
-0
No files found.
content/docs/introduction/comparison.md
0 → 100644
View file @
b3e57b67
---
title
:
Comparison to alternatives
sort_rank
:
4
---
# Comparison to alternatives
## Graphite
**Scope**
[
Graphite
](
http://graphite.readthedocs.org/en/latest/
)
TODO: TODO: explain scope of Graphite.
**Data model**
Graphite stores numeric samples for named time series, much like Prometheus
does. However, Prometheus's metadata model is richer: while Graphite metric
names consist of dot-separated components which implicitly encode dimensions,
Prometheus encodes dimensions explicitly as key-value pairs (labels) attached
to a metric name. This allows easy filtering and grouping by these labels via
its query language.
Further, especially when Graphite is used in combination with StatsD, it is
common to store only aggregated data over all monitored instances, rather than
preserving the instance as a dimension and being able to drill down into
individual problematic ones.
As an example, storing the number of HTTP requests to API servers with the
response code
`500`
and the method
`POST`
to the
`/tracks`
controller would
commonly be encoded like this in Graphite/StatsD:
```
stats.api-server.tracks.post.500 -> 93
```
In Prometheus the same data could be encoded like this (assuming three api-server instances):
```
api_server_http_requests_total{method="POST","handler="/tracks",status="500",instance="<sample1>"} -> 34
api_server_http_requests_total{method="POST","handler="/tracks",status="500",instance="<sample2>"} -> 28
api_server_http_requests_total{method="POST","handler="/tracks",status="500",instance="<sample3>"} -> 31
```
**Storage**
Graphite's storage format expects samples to arrive at regular intervals, while
Prometheus stores data at arbitrary intervals, as the data gets stored.
TODO: TODO: Explain more about how timeseries data is stored in Prometheus vs.
Graphite's Whisper.
**Sample ingestion**
TODO: TODO: Explain StatsD vs. Prometheus data ingestion.
## InfluxDB
[
InfluxDB
](
http://influxdb.com/
)
is a very promising new open-source time
series database. It didn't exist when Prometheus development began, so we were
unable to consider it as an alternative at the time. Still, there are
significant differences between Prometheus and InfluxDB, and both systems are
geared towards slightly different use cases.
The comparisons below attempt to help you choose the right system for your use
case and taste:
**Scope**
InfluxDB focusses on being a passive time series database with a query
language. Any other concerns are addressed by external components.
Prometheus is a full monitoring and trending system that includes built-in and
active scraping, storing, querying, and alerting based on time series data. It
has knowledge about what the world should look like (which endpoints should
exist, what time series patterns mean trouble, etc.), and actively tries to find
faults.
**Architecture**
Prometheus servers run independently of each other and only rely on their local
storage for their core functionality: scraping, rule processing, and alerting.
InfluxDB is by design a distributed storage cluster with storage and queries
being handled by many nodes at once.
This means that InfluxDB will easier to scale horizontally, but it also means
that you have to manage the complexity of a distributed storage system from the
get-go. Prometheus will be simpler to run, but at some point you will need to
shard servers explicitly along scalability boundaries like products, services,
datacenters, or similar. Independent servers (which can be run redundantly in
parallel) may also give you better reliability and failure isolation, though
that is debatable, since InfluxDB also can tolerate node outages due to data
replication.
**Data model / storage**
*Summary:*
InfluxDB stores rows of events with full metadata for each event;
Prometheus only stores numeric samples for existing time series.
While InfluxDB's data model also allows annotation of data with arbitrary
key-value pairs, it differs significantly from Prometheus in the way this data
is modeled and stored. InfluxDB stores timestamped events with full metadata
(key-value pairs) attached to each event / row. Prometheus stores only numeric
time series and stores metadata for each time series exactly once, and then
continues to simply append timestamped samples for that existing metadata
entry. In a
[
test from March 2014
](
https://docs.google.com/document/d/1OgnI7YBCT_Ub9Em39dEfx9BuiqRNS3oA62i8fJbwwQ8/edit?usp=sharing
)
,
storing typical Prometheus time series data in InfluxDB lead to a
**
11x disk
storage size increase
**
due to this metadata redundancy.
If you are only interested in tracking the development of existing named
time series (for example, the cumulative count of HTTP requests with the method
`POST`
and the handler
`/api/tracks`
on the instance
`http://1.2.3.4:12345/metrics`
), Prometheus will require much less storage
space than InfluxDB. Further, Prometheus indexes all time series dimensions for
efficient filtering, while InfluxDB currently only indexes tables by row
timestamps (issue to track adding column indexes:
https://github.com/influxdb/influxdb/issues/582). Thus, I would expect
Prometheus to be more efficient at filtering data.
Still, InfluxDB is better geared towards the following use cases:
*
storing all
**individual**
events, not just time series of values
*
e.g. storing every HTTP request with full metadata
*vs.*
storing the cumulative count of HTTP requests for certain dimensions
*
storing time series with completely unbounded dimensionality
*
e.g. storing user IDs or email addresses in the key-value metadata
*vs.*
storing bounded dimensionality like the HTTP method, HTTP handler and
instance ID
There are other storage features, such as downsampling, which InfluxDB supports
and Prometheus doesn't yet.
## OpenTSDB
TODO: TODO: compare Prometheus to OpenTSDB.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment