Google Cloud Platform Blog
Adventures in SRE-land: Welcome to Google Mission Control
Monday, July 11, 2016
Posted by Paul Newson, Mission Controller
Wait. That’s not Google. That’s Houston.
We do have a Mission Control at Google, named in honor of NASA’s
Christopher C. Kraft Jr. Mission Control Center
, pictured here. But at Google, Mission Control is not a place. It’s a six month rotation program for engineers working on product development to experience what it’s like to be a Site Reliability Engineer (SRE). The goal is to increase the number of engineers who understand the challenges of building and operating a high reliability service at Google's scale.
The Mission Control inspiration goes further; SREs at Google are issued jackets that bear a flight patch inspired by the one
Gene Kranz
had commissioned for the Mission Controllers in Houston
1
. It bears the “
Kranz Dictum
” of “Tough and Competent” in Latin: “Duri et Periti”. If you see someone wearing a leather jacket with this flight patch, you’re looking at a Google SRE.
But what is an SRE? According to Google Vice President of Engineering Ben Treynor Sloss, who coined the term SRE, “SRE is what happens when you ask a software engineer to design an operations function.” In 2003, Ben was asked to lead Google’s existing “Production Team” which at the time consisted of seven software engineers. The team started as a software engineering team, and since Ben is also a software engineer, he continued to grow a team that he, as a software engineer, would still want to work on. Thirteen years later, Ben leads a team of roughly 2,000 SREs, and it is still a team that software engineers want to work on. About half of the engineers who do a Mission Control rotation choose to remain an SRE after their rotation is complete.
Google has been putting the word out about SRE for the past couple of years. Ben gave a
talk at SREcon14
where he shared the principles of SRE learned over 11 years of building the team at Google. Melissa Binde gave a
talk at GCP Next 2016
where she provided some pointers on how to apply some of the techniques we use at Google to your workloads running in our cloud. And if you really want to dig deep, the
Site Reliability Engineering
book is now available, and highly recommended reading.
Over the next six months, I will be on the
uncomfortably exciting
adventure of my own Mission Control rotation with the SRE team in Seattle that looks after
Google Compute Engine
. I will also be sharing some of the things I learn along the way with everyone here on this blog. So, if you want to learn more about being an SRE and how Site Reliability Engineering impacts our cloud services, keep watching this space.
1
http://genedorr.com/patches/Ground.html
No comments :
Post a Comment
Don't Miss Next '17
Use promo code NEXT1720 to save $300 off general admission
REGISTER NOW
Free Trial
GCP Blogs
Big Data & Machine Learning
Kubernetes
GCP Japan Blog
Labels
Announcements
56
Big Data & Machine Learning
91
Compute
156
Containers & Kubernetes
36
CRE
7
Customers
90
Developer Tools & Insights
80
Events
34
Infrastructure
24
Management Tools
39
Networking
18
Open Source
105
Partners
63
Pricing
24
Security & Identity
23
Solutions
16
Stackdriver
19
Storage & Databases
111
Weekly Roundups
16
Archive
2017
Feb
Jan
2016
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Feed
Subscribe by email
Technical questions? Check us out on
Stack Overflow
.
Subscribe to
our monthly newsletter
.
Google
on
Follow @googlecloud
Follow
Follow
No comments :
Post a Comment