Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
D
docs
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Administrator
docs
Commits
728527a1
Commit
728527a1
authored
Mar 23, 2016
by
Brian Brazil
Browse files
Options
Browse Files
Download
Plain Diff
Merge pull request #356 from brian-brazil/life360
Add user interview with Life360
parents
17708cb5
406a193b
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
107 additions
and
0 deletions
+107
-0
2016-03-23-interview-with-life360.md
content/blog/2016-03-23-interview-with-life360.md
+107
-0
cx_client.png
static/blog/2016-03-23/cx_client.png
+0
-0
nsq_overview.png
static/blog/2016-03-23/nsq_overview.png
+0
-0
No files found.
content/blog/2016-03-23-interview-with-life360.md
0 → 100644
View file @
728527a1
---
title
:
Interview with Life360
created_at
:
2016-03-23
kind
:
article
author_name
:
Brian Brazil
---
*
This is the first in a series of interviews with users of Prometheus, allowing
them to share their experiences of evaluating and using Prometheus. Our first
interview is with Daniel from Life360.
*
## Can you tell us about yourself and what Life360 does?
I’m Daniel Ben Yosef, a.k.a, dby, and I’m an Infrastructure Engineer for
[
Life360
](
https://www.life360.com/
)
, and before that, I’ve held systems
engineering roles for the past 9 years.
Life360 creates technology that helps families stay connected, we’re the Family
Network app for families. We’re quite busy handling these families - at peak
we serve 700k requests per minute for 70 million registered families.
We manage around 20 services in production, mostly handling location requests
from mobile clients (Android, iOS, and Windows Phone), spanning over 150+
instances at peak. Redundancy and high-availability are our goals and we strive
to maintain 100% uptime whenever possible because families trust us to be
available.
We hold user data in both our MySQL multi-master cluster and in our 12-node
Cassandra ring which holds around 4TB of data at any given time. We have
services written in Go, Python, PHP, as well as plans to introduce Java to our
stack. We use Consul for service discovery, and of course our Prometheus setup
is integrated with it.
## What was your pre-Prometheus monitoring experience?
Our monitoring setup, before we switched to Prometheus, included many
components such as:
*
Copperegg (now Idera)
*
Graphite + Statsd + Grafana
*
Sensu
*
AWS Cloudwatch
We primarily use MySQL, NSQ and HAProxy and we found that all of the monitoring
solutions mentioned above were very partial, and required a lot of
customization to actually get all working together.
## Why did you decide to look at Prometheus?
We had a few reasons for switching to Prometheus, one of which is that we
simply needed better monitoring.
Prometheus has been known to us for a while, and we have been tracking it and
reading about the active development, and at a point (a few months back) we
decided to start evaluating it for production use.
The PoC results were incredible. The monitoring coverage of MySQL was amazing,
and we also loved the JMX monitoring for Cassandra, which had been sorely
lacking in the past.
[
![Cassandra Client Dashboard
](
/assets/blog/2016-03-23/cx_client.png
)
](/assets/blog/2016-03-23/cx_client.png)
## How did you transition?
We started with a relatively small box (4GB of memory) as an initial point. It
was effective for a small number of services, but not for our full monitoring
needs.
We also initially deployed with Docker, but slowly transitioned to its own box
on an r3.2xl instance (60GB ram), and that holds all of our service monitoring
needs with 30 days of in-memory data.
We slowly started introducing all of our hosts with the Node Exporter and built
Grafana graphs, up to the point where we had total service coverage.
We were also currently looking at InfluxDB for long term storage, but due to
[
recent developments
](
https://influxdata.com/blog/update-on-influxdb-clustering-high-availability-and-monetization/
)
,
this may no longer be a viable option.
We then added exporters for MySQL, Node, Cloudwatch, HAProxy, JMX, NSQ (with a
bit of our own code), Redis and Blackbox (with our own contribution to add
authentication headers).
[
![NSQ Overview Dashboard
](
/assets/blog/2016-03-23/nsq_overview.png
)
](/assets/blog/2016-03-23/nsq_overview.png)
## What improvements have you seen since switching?
The visibility and instrumentation gain was the first thing we saw. Right
before switching, we started experiencing Graphite’s scalability issues, and
having an in-place replacement for Graphite so stakeholders can continue to use
Grafana as a monitoring tool was extremely valuable to us. Nowadays, we are
focusing on taking all that data and use it to detect anomalies, which will
eventually become alerts in the Alert Manager.
## What do you think the future holds for Life360 and Prometheus?
We currently have one of our projects instrumented directly with a Prometheus
client, a Python-based service. As we build out new services, Prometheus is
becoming our go-to for instrumentation, and will help us gain extremely
meaningful alerts and stats about our infrastructure.
We look forward to growing with the project and keep contributing.
*Thank you Daniel! The source for Life360's dashboards is shared on [Github](https://github.com/life360/prometheus-grafana-dashboards).*
static/blog/2016-03-23/cx_client.png
0 → 100644
View file @
728527a1
428 KB
static/blog/2016-03-23/nsq_overview.png
0 → 100644
View file @
728527a1
809 KB
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment