Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
D
docs
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Administrator
docs
Commits
f2c66f96
Unverified
Commit
f2c66f96
authored
Mar 17, 2018
by
Brian Brazil
Committed by
GitHub
Mar 17, 2018
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Add user interview from Datawire. (#989)
parent
5042d52c
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
82 additions
and
0 deletions
+82
-0
2018-03-16-interview-with-datawire.md
content/blog/2018-03-16-interview-with-datawire.md
+82
-0
dashboard.png
static/blog/2018-03-16/dashboard.png
+0
-0
No files found.
content/blog/2018-03-16-interview-with-datawire.md
0 → 100644
View file @
f2c66f96
---
title
:
Interview with Datawire
created_at
:
2018-03-16
kind
:
article
author_name
:
Brian Brazil
---
*
Continuing our series of interviews with users of Prometheus, Kevin Burton
from Richard Li talks about how they transitioned to Prometheus.
*
## Can you tell us about yourself and what Datawire does?
At Datawire, we make open source tools that help developers code faster on
Kubernetes. Our projects include
[
Telepresence
](
https://www.telepresence.io/
)
,
for local development of Kubernetes services;
[
Ambassador
](
https://www.getambassador.io/
)
, a Kubernetes-native API Gateway
built on the
[
Envoy Proxy
](
https://www.envoyproxy.io/
)
; and
[
Forge
](
https://forge.sh/
)
, a build/deployment system.
We run a number of mission critical cloud services in Kubernetes in AWS to
support our open source efforts. These services support use cases such as
dynamically provisioning dozens of Kubernetes clusters a day, which are then
used by our automated test infrastructure.
## What was your pre-Prometheus monitoring experience?
We used AWS CloudWatch. This was easy to set up, but we found that as we
adopted a more distributed development model (microservices), we wanted more
flexibility and control. For example, we wanted each team to be able to
customize their monitoring on an as-needed basis, without requiring operational
help.
## Why did you decide to look at Prometheus?
We had two main requirements. The first was that we wanted every engineer here
to be able to have operational control and visibility into their service(s).
Our development model is highly decentralized by design, and we try to avoid
situations where an engineer needs to wait on a different engineer in order to
get something done. For monitoring, we wanted our engineers to be able to have
a lot of flexibility and control over their metrics infrastructure. Our second
requirement was a strong ecosystem. A strong ecosystem generally means
established (and documented) best practices, continued development, and lots of
people who can help if you get stuck.
Prometheus, and in particular, the
[
Prometheus
Operator](https://github.com/coreos/prometheus-operator), fit our requirements.
With the Prometheus Operator, each developer can create their own Prometheus
instance as needed, without help from operations (no bottleneck!). We are also
members of the
[
CNCF
](
https://www.cncf.io/
)
with a lot of experience with the
Kubernetes and Envoy communities, so looking at another CNCF community in
Prometheus was a natural fit.

## How did you transition?
We knew we wanted to start by integrating Prometheus with our API Gateway. Our
API Gateway uses Envoy for proxying, and Envoy automatically emits metrics
using the statsd protocol. We installed the Prometheus Operator (some detailed
notes
[
here
](
https://www.cncf.io/
)
) and configured it to start collecting stats
from Envoy. We also set up a Grafana dashboard
[
based on some
work](https://www.cncf.io/) from another Ambassador contributor.
## What improvements have you seen since switching?
Our engineers now have visibility into L7 traffic. We also are able to use
Prometheus to compare latency and throughput for our canary deployments to give
us more confidence that new versions of our services don’t cause performance
regressions.
## What do you think the future holds for Datawire and Prometheus?
Using the Prometheus Operator is still a bit complicated. We need to figure out
operational best practices for our service teams (when do you deploy a
Prometheus?). We’ll then need to educate our engineers on these best practices
and train them on how to configure the Operator to meet their needs. We expect
this will be an area of some experimentation as we figure out what works and
what doesn’t work.
static/blog/2018-03-16/dashboard.png
0 → 100644
View file @
f2c66f96
188 KB
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment