Last year, we published a guide for our customers who had familiarity and expertise with AWS but wanted to learn how it compares to ...




Last year, we published a guide for our customers who had familiarity and expertise with AWS but wanted to learn how it compares to Google Cloud Platform. The guide had a really positive reception, helping customers understand things like how Cloud Platform delivers Infrastructure as a Service with Google Compute Engine and how our VPN works.



Today, we're happy to announce a major expansion to the Cloud Platform for AWS Professionals guide, with new sections covering Big Data services, Storage services and Containers as a Service (Google Container Engine).




Amazon ECS vs. Google Container Engine at a glance




How Amazon Elastic MapReduce compares to Google Cloud Dataproc and Cloud Dataflow

As we said last year, this guide is a work-in-progress. We have some ideas about what topics we’d like to tackle next (services like Databases and Development tools) but we’d also love to hear what you think we should cover.



We hope you find this information useful and makes learning about Cloud Platform enjoyable. Please tell us what you think, and be sure to sign up for a free trial!





Here at Google Cloud Platform, we pride ourselves on efficiency and reducing waste, from our datacenters all the way to individual virtual machine instances. We want to make sure that our users are getting good value for their money. To that end ...




Here at Google Cloud Platform, we pride ourselves on efficiency and reducing waste, from our datacenters all the way to individual virtual machine instances. We want to make sure that our users are getting good value for their money. To that end, VM Rightsizing Recommendations is now in beta.



Knowing which VM machine type to choose before you start running your workloads is challenging, especially given that your load may change over time. Select a machine that’s too large and you overpay. Select a machine that’s too small and your service is starved for resources. And while it’s possible to split some workloads amongst identical machines and use Managed Instance Group Autoscaler to balance resources, that doesn’t work for all workloads. Some workloads such as databases or file servers can’t be easily distributed.



VM Rightsizing Recommendations can help you see at a glance if your machines are the right size for the work that you assigned them. It monitors your CPU and RAM usage over time and makes recommendations about the size of your VMs. When applicable, it estimates how much you could save or whether your instances are overloaded and displays that information right on the VM Instances page — just look for the light bulb!




The VM instances page shows instances where you can save money or increase performance as well as estimated savings per instance in total (click to enlarge)



When you’re ready, you can resize your VMs with a single click. And if your workload changes and your machine type is no longer a fit, we’ll let you know.



For more information about the VM Rightsizing Recommendations beta, including how we come up with the recommendations, click here.





Is multi-cloud a pipe dream? I think not!



From startups to enterprises, despite material increases in efficiency and the price to performance ratio of the compute, network and storage resources we all use, infrastructure continues to come at substantial cost. It can also be a real risk driver; each implementation choice affects future scalability, service level and flexibility of the services being built. It’s fair to say that “future-proofing” should be the primary concern of every system architect.




Is multi-cloud a pipe dream? I think not!



From startups to enterprises, despite material increases in efficiency and the price to performance ratio of the compute, network and storage resources we all use, infrastructure continues to come at substantial cost. It can also be a real risk driver; each implementation choice affects future scalability, service level and flexibility of the services being built. It’s fair to say that “future-proofing” should be the primary concern of every system architect.



Providers of infrastructure aren’t disinterested actors either; there are huge incentives for any vendor to increase lock-in through contractual, fiscal and technical constrictions. In many cases, interest in cloud infrastructure, particularly existing consumers of infrastructure, has been driven by a huge urge to break free of existing enterprise vendor relationships for which the lock-in costs are higher than the value provided. Once they have some kind of lock-in working, infrastructure companies know that they can charge higher rents without necessarily earning them.



So, how can you swing the power dynamic around so that you, as the consumer of infrastructure, get the most value out of your providers at the lowest cost?



A good first step is to actively resist lock-in mechanisms. Most consumers have figured out that long-term contractual commitments can be dangerous. Most have figured out that pre-paid arrangements distort decisionmaking and can be dangerous. Technical lock-in remains one of the most difficult to avoid. Many providers wrap valuable differentiated services in proprietary APIs so that applications eventually get molded around their design. These “sticky services” or “loss leaders” create substantial incentives for tech shops to take the shorter path to value and accept a bit of lock-in risk. This is a prevalent form of technical debt, especially when new vendors release even more powerful and differentiated tools in the same space, or when superior solutions rise out of OSS communities.



In the past, some companies tried to help users get out from under this debt by building abstraction layers on top of the proprietary APIs from each provider, so that users could use one tool to broker across multiple clouds. This approach has been messy and fragile, and tends to compromise to the lowest-common denominator across clouds. It also invites strategic disruption from cloud providers in order to preserve customer lock-in.




Open architectures


Thankfully, this isn’t the only way technology works. It’s entirely possible to efficiently build scaled, high performance, cost-efficient systems without accepting unnecessary technical lock-in risk or tolerating the lowest-common denominator. You can even still consume proprietary infrastructure products, as long as you can prove to yourself that because those products expose open APIs, you can move when you want to. This is not to say that this isn’t complex, advanced work. It is. But the amount of time and effort required is shrinking radically every day. This gives users leverage; as your freedom goes up, it becomes easier and easier to treat providers like the commodities they ought to be.



We understand the value of proprietary engineering. We’ve created a purpose built cloud stack, highly tuned for scale, performance, security, and flexibility. We extract real value from this investment, through our advertising, applications as well as our cloud businesses. But GCP, along with some other providers and members of the broader technology community, recognize that when users have power, they can do powerful things. We’ve worked hard to deliver services that are differentiated by their performance, stability and cost, but not by proprietary, closed APIs. We know this means that you can stop using us when you want to; we think that gives you the power to use us at lower risk. Some awesome folks have started calling this approach GIFEE or “Google Infrastructure For Everyone Else. But given the overwhelming participation and source code contributions — including those for kubernetes — from individuals and companies of all sizes to the OSS projects involved, it’s probably more accurate to call it Everyone’s Infrastructure, For Every Cloud — unfortunately that’s a terrible acronym.












A few salient examples:







Applications can run in containers on Kubernetes, the OSS container orchestrator that Google helped create, either managed and hosted by us via GKE, or on any provider, or both at the same time.




Kubernetes ensures that your containers aren’t locked in.





















Web apps can run in a PaaS environment like AppScale, the OSS application management framework, either managed and hosted by us via Google AppEngine, or on any provider, or both at the same time. Importantly this includes the NoSQL transactional stores required by apps, either powered by AppScale, which uses Cassandra as a storage layer and vends the Google App Engine Datastore API to applications, or native in App Engine.




AppScale ensures that your apps aren’t locked in.
























NoSQL k-v stores can run Apache HBase, the OSS NoSQL engine inspired by our Bigtable whitepaper, either managed and hosted by us via Cloud Bigtable, or on any other provider, or both at the same time.




HBase ensures that your NoSQL isn’t locked in.



















OLAP systems can be run using Druid or Drill, two OSS OLAP engines inspired by Google’s Dremel system. These are very similar to BigQuery, and allow you to run on any infrastructure.  






Druid and Drill ensure that your OLAP system isn’t locked in.




















Advanced RDBMS can be built in Vitess, the OSS MySQL toolkit we helped create, either hosted by us inside Google Container Engine, or on any provider via Kubernetes, or both at the same time. You can also run MySQL fully managed on GCP via CloudSQL.




Vitess ensures that your relational database isn’t locked in.



















Data orchestration can run in Apache Beam, the OSS ETL engine we helped create, either managed and hosted by us via Cloud Dataflow, or on any provider, or both at the same time.




Beam ensures that your data ETL isn’t locked in.



















Machine Learning can be built in TensorFlow, the OSS ML toolkit we helped create, either managed and hosted by us via CloudML, or on any provider, or both at the same time.




TensorFlow ensures that your ML isn’t locked in.



















Object storage can be built in Minio.io which vends the S3 API via OSS, either managed and hosted by us via GCS which also emulates the S3 API, or on any provider, or both at the same time.




Minio ensures that your object store isn’t locked in.



















Continuous Deployment tooling can be delivered using Spinnaker, a project started by the Netflix|OSS team, either hosted by us via GKE, or on other providers, or both at the same time.




Spinnaker ensures that your CD tooling isn’t locked in.











What’s still proprietary, but probably OK?



CDN, DNS, Load Balancing


Because the interfaces to these kinds of services are network configurations rather than code, so far these have remained proprietary across providers. NGINX and Varnish make excellent OSS load balancers/front-end caches, but because of the low friction low risk switchability, there’s no real need to avoid DNS or LB services on public clouds.




File Systems


These are still pretty hard for cloud providers to deliver as managed services at scale; Gluster FS, Avere, ZFS and others are really useful to deliver your own POSIX layers irrespective of environment. If you’re building inside Kubernetes, take a look at CoreOS’s Torus project.



It’s not just software, it’s the data

Lock-in risk comes in many forms, one of the most powerful being data gravity or data inertia. Even if all of your software can move between infrastructures with ease, those systems are connected by a limited, throughput constrained internet and once you have a petabyte written down, it can be a pain in the neck to move. What good is software you can move in a minute if it takes a month to move the bytes?



There are lots of tools that help, both native from GCP, and from our growing partner ecosystem.


  • If your data is in an object store, look no further than the Google Storage Transfer Service, an easy automated tool for moving your bits from A to G.

  • If you have data on tape or disk, take a look at the Offline Media Import/Export service. You might need to regularly move data to and from our cloud, so take a look at Google Cloud Interconnect for leveraging carriers or public peering points to connect reliably with us.

  • If you have VM images you’d like to move to cloud quickly we recommend Cloud Endure to move and transform your images for running on Google Compute Engine.

  • If you have a database you need replicated, take a look at Attunity CloudBeam. If you’re trying to migrate bulk data, try FDT from CERN.

  • If you’re doing data imports, perhaps Embulk.



Conclusion


We hope the above helps you choose open APIs and technologies designed to help you grow without locking you in. That said, remember that the real proof you have the freedom to move is to actually move; try it! Customers have told us about their new-found power at the negotiating table when they can demonstrably run their application across multiple providers.



All of the above mentioned tools, in combination with strong private networking between providers, allow your applications to span providers with a minimum of provider-specific implementation detail.



If you have questions about how to implement the above, about other parts of the stack this kind of thinking applies to, or about how you can get started, don’t hesitate to reach out to us at Google Cloud Platform, we’re eager to help.






Here at Google Cloud Platform, we're working tirelessly toward making Google Compute Engine the first place you turn when you need a scalable, manageable and reliable cloud on which to run your virtual machines.




Here at Google Cloud Platform, we're working tirelessly toward making Google Compute Engine the first place you turn when you need a scalable, manageable and reliable cloud on which to run your virtual machines.



A Managed Instance Group is a group of machines that are all built off the same instance template and is the foundation for easily building scalable, reliable systems. It lets you auto-heal VM instances by monitoring the state of machines and applications and recreating them when needed. It allows you to autoscale and load-balance to help you serve requests no matter the traffic growth.



Managed Instance Groups just got even better! For applications that require high availability, we're announcing the beta of Regional Managed Instance Groups. Now when you select a multi-zone configuration, Compute Engine automatically spreads the VMs across three zones in the same region equally, so that even in the rare case of a zone-level outage, two thirds of the instances continue to serve. When Autoscaler is enabled, it automatically adds instances to handle the increased traffic in other zones. You can further protect your system by overprovisioning your Regional Managed Instance Group.



This feature has been in the hands of select alpha partners for several months. We’re now ready to open it up to a wider range of Compute Engine users, across all supported Compute Engine regions. Click here for more information about regional managed instance groups





"So let me get this straight. You want to build an external version of the Borg task scheduler. One of our most important competitive advantages. The one we don’t even talk about externally. And, on top of that, you want to open source it?"



"So let me get this straight. You want to build an external version of the Borg task scheduler. One of our most important competitive advantages. The one we don’t even talk about externally. And, on top of that, you want to open source it?"

The story of how Kubernetes came to be starts here. It was Summer 2013, and we were in a room with Urs Holzle, head of technical infrastructure and chief architect of many of Google’s most important network innovations. We were pitching him our idea to build an open source container management system. But it wasn’t going well. Or so we thought.



To really understand how we ended up in that meeting, you have to back up a bit. For years, Google had been quietly building some of the best network infrastructure to power intensive online services like Google Search, Gmail and YouTube. We built everything from scratch because we had to, and in the early days, we were on a tight budget. In order to wring every possible ounce of performance out of our servers, we had started experimenting with containers over a decade ago. We built a cluster management system called Borg, which runs hundreds of thousands of jobs and makes computing much more efficient — allowing us to run our data centers at high utilization.



Later, we used this same infrastructure to deliver Google Cloud Platform, so anyone could use it for their computing needs. However, with the launch of our Infrastructure-as-a-Service platform Google Compute Engine, we noticed an interesting problem: customers were paying for a lot of CPUs, but their utilization rates were extremely low because they were running VMs. We knew we had an internal solution for this. And what’s more, we knew that containers were the future of computing — they’re scalable, portable and more efficient. The container system Docker was already up and running, and we thought it was great. But the trick, which we knew through years of trial and error within Google, was a great container management system. That’s what we wanted to build.



Even though we had been rejected before, we didn’t give up. Good ideas usually win out at Google, and we were convinced this was a good idea. We met with anyone who would listen to us to pitch the idea. A turning point was a fateful shuttle ride where I found myself sitting next to Eric Brewer, VP of Cloud, and one of Urs’s key strategists. I had an uninterrupted chunk of time to explain the idea to Eric, and he was convinced. Soon after, we got the green light from Urs.



In keeping with the Borg theme, we named it Project Seven of Nine. (Side note: in an homage to the original name, this is also why the Kubernetes logo has seven sides.) We wanted to build something that incorporated everything we had learned about container management at Google through the design and deployment of Borg and its successor, Omega — all combined with an elegant, simple and easy-to-use UI. In three months, we had a prototype that was ready to share.



We always believed that open-sourcing Kubernetes was the right way to go, bringing many benefits to the project. For one, feedback loops were essentially instantaneous — if there was a problem or something didn’t work quite right, we knew about it immediately. But most importantly, we were able to work with lots of great engineers, many of whom really understood the needs of businesses who would benefit from deploying containers (have a look at the Kubernetes blog for perspectives from some of the early contributors). It was a virtuous cycle: the work of talented engineers led to more interest in the project, which further increased the rate of improvement and usage.



What started as an internal summer conversation has evolved into a global movement. Kubernetes is now deployed in thousands of organizations (e.g., Box), supported by over 830 contributors that have collectively put in 237 person years of coding effort to date. Velocity that even our wildest goals didn’t anticipate. To our contributing peers and community advocates, a sincere thank you for making Kubernetes so welcoming and transparent. And to you Kubernetes, a very happy birthday to you!



If you haven’t tried Kubernetes, it’s easy to get started using Google Container Engine; begin your 60-day free trial here. And to learn more about the Kubernetes story, check out the Kubernetes Origins podcast on Software Engineering Daily.





In case you hadn’t heard, we here at Google Cloud Platform released the Cloud Natural Language API this week, and an open beta of the ...




In case you hadn’t heard, we here at Google Cloud Platform released the Cloud Natural Language API this week, and an open beta of the Speech API.



Both the Natural Language and Speech APIs are just the latest examples in the Cloud Machine Learning technologies that we’ve made available to the public, following on the heels of the Vision API and Translate API. But what exactly do these latest APIs allow you to do?



Natural Language is all about parsing written text — you know, the kind that you’re looking at right now. By way of introduction, check out Google Developer Advocate Sara Robinson’s post on how she used the Natural Language API to analyze stories in The New York Times, while introducing us to the NL concepts of “sentiment” and “entities.”



Google Developer Advocate Guillaume Laforge dives deeper into sentiments by color coding tweets as strong positive, strongly negative — or somewhere in between — according to the polarity and magnitude unearthed by Natural Language. Turns out that @googlecloud tweets are all over the map, sentiment-wise, judging by this many-colored chart.




Positive tweets are green, negative tweets are red and neutral tweets are yellow

Others may choose to sample much less colorful text streams, such as Theresa May’s inaugural speech as British Prime Minister. In a blog, Javier Ramirez, a Google expert at Teowaki, uses the Speech API to convert the audio to text, then feeds it to Natural Language to analyze its entities and sentiments. “I never suspected Brexit could be this fun,” he writes.



But how reliable are these latest machine learning offerings? Make no mistake, it’s early days, and natural language processing is an imperfect science. Over on Hacker News, some people reported mixed results with Natural Language. Check out the conversation with Google Natural Language Product Manager Dave Orr, who explains why a sentence that is so easy for a human “wetware” brain to understand can still trip up a computer. “It's the curse of [natural language processing], really,” he says. “All the easy things are hard. (And the hard things are nigh impossible.)”



We hope you’ll be the judge. Scroll down to the bottom of the Cloud Natural Language API page, and enter a snippet of text and try the API.





Following our announcements from GCP NEXT in March, we’re excited to share updates about ...




Following our announcements from GCP NEXT in March, we’re excited to share updates about Cloud Platform expansion and machine learning. Today we’re launching two new Machine Learning APIs into open beta and expanding our footprint in the United States.




Cloud Machine Learning APIs enter open beta


Google Cloud Platform unlocks the capability for enterprises to process unstructured data through machine learning. Today, we’re announcing two new Cloud Machine Learning products that are entering beta: Cloud Natural Language and Cloud Speech APIs.



We spend a lot of time thinking about how computer systems can read in order to process human language in intelligent ways. For example, we most recently open-sourced SyntaxNet (which includes Parsey McParseface), a natural language model that analyzes the grammatical structure of text with the best accuracy, speed and scale.



The new Google Cloud Natural Language API in open beta is based on our natural language understanding research. Cloud Natural Language lets you easily reveal the structure and meaning of your text in a variety of languages, with initial support for English, Spanish and Japanese. It includes:


  • Sentiment Analysis: Understand the overall sentiment of a block of text

  • Entity Recognition: Identify the most relevant entities for a block of text and label them with types such as person, organization, location, events, products and media

  • Syntax Analysis: Identify parts of speech and create dependency parse trees for each sentence to reveal the structure and meaning of text


The new API is optimized to meet the scale and performance needs of developers and enterprises in a broad range of industries. For example, digital marketers can analyze online product reviews or service centers can determine sentiment from transcribed customer calls. We’ve also seen great results from our Alpha customers, including British online marketplace Ocado Technology.



To see Cloud Natural Language API in action, check out our demo that uses Cloud Natural Language to analyze top stories from The New York Times.


Google’s Cloud Natural Language API has shown it can accelerate our offering in the natural language understanding area and is a viable alter native to a custom model we had built for our initial use case.                                                              - Dan Nelson, Head of Data, Ocado Technology

Cloud Speech API will also enter open beta today. Enterprises and developers now have access to speech-to-text conversion in over 80 languages, for both apps and IoT devices. Cloud Speech API uses the voice recognition technology that has been powering your favorite products such as Google Search and Google Now.



More than 5,000 companies signed up for Speech API alpha, including:


  • HyperConnect, a video chat app with over 50 million downloads in over 200 countries, uses a combination of our Cloud Speech and Translate API to automatically transcribe and translate conversations between people who speak different languages.

  • VoiceBase, a leader in speech analytics as a service, uses Speech API to let developers surface insights and predict outcomes from call recordings.


This beta version adds new features based on customer feedback from alpha, such as:


  • Word hints: custom words or phrases by context can be added to API calls to improve recognition. Useful for both command scenarios (e.g., a smart TV listening for “rewind” and “fast-forward” when watching a movie) and adding new words to the dictionary (e.g., recognizing names that may not be common in a specific language)

  • Asynchronous calling: the API has been substantially simplified with new asynchronous calls that make developing voice-enabled apps easier and faster


To use our newest machine learning APIs in open beta and see more details about pricing, check out Cloud Natural Language and Cloud Speech on our website.




Google Cloud Platform expands on the North American West Coast


For Cloud Platform customers on the west coast of North America and Canada, we’re pleased to announce our Oregon Cloud Region (us-west1) is now open for business. This region initially includes three of our core offerings: Google Compute Engine, Google Cloud Storage and Google Container Engine, and features two Compute Engine zones to support high availability applications.



Our initial testing shows that users in cities such as Vancouver, Seattle, Portland, San Francisco and Los Angeles can expect to see a 30-80% reduction in latency for applications served from us-west1, compared to us-central1.



One industry where latency is critical is gaming. Players of today’s premium games expect twitch-fast networks to enable immersive, real-time gaming experiences. Multiplay is a unique video game hosting specialist behind many of today’s top AAA games. Multiplay’s games hosted out of the new us-west1 data center ensures that players in the western region of North America have a consistent, fast user experience on top of Google Cloud Platform.


Regional latency is a major factor in the gaming experience. Google Cloud Platform’s network is one of the best we’ve worked with, from a tech perspective but also in terms of the one-on-one support we’ve received from the team.                                                                                                                                                       - Paul Manuel, Director of Multiplay Game Services



And as we announced in March, Tokyo will be coming online later this year and we will announce more than 10 additional regions in 2017. For a current list of GCP regions, please have a look at the Cloud Locations page.








The combination of Google App Engine and Stackdriver Error Reporting is a powerful one. App Engine allows you to focus on your application without having to worry about the underlying infrastructure. Error Reporting, meanwhile, provides visibility into your application's health so that you can focus on relevant errors, detect new problems early and ultimately increase the quality of your application.




The combination of Google App Engine and Stackdriver Error Reporting is a powerful one. App Engine allows you to focus on your application without having to worry about the underlying infrastructure. Error Reporting, meanwhile, provides visibility into your application's health so that you can focus on relevant errors, detect new problems early and ultimately increase the quality of your application.



After a short beta period during which we fine-tuned the grouping algorithms and made sure it scaled for our biggest customers, Stackdriver Error Reporting is now generally available on Google App Engine standard environment. No setup is required  it just works out of the box for Java, Python, Go and PHP applications in the App Engine standard environment.




See your top application errors on the App Engine dashboard (click to enlarge)

Stackdriver Error Reporting works by automatically analyzing App Engine error logs, extracting more than one thousand stack traces per second in order to group and count them. You can opt to receive email notifications when there’s a new error that was never seen before. The report page lists which services and versions are affected, when the error was first seen and relevant exception stack frames.




Stackdriver Error Reporting shows affected versions, stack traces and a link to the request log (click to enlarge)

Early adopters of Stackdriver Error Reporting have very positive feedback.




We've been in search of a solution like this and were about to try wiring together various vendor solutions. Stackdriver Error Reporting really hits the mark and has immediately uncovered issues that had been unnoticed in production. I credit this tool with raising the quality and user experience of our product to a huge degree.

                                                                                    - John Koehl, CTO at Batterii



Visit http://console.cloud.google.com/errors to see any errors in your App Engine project.






It’s been a couple of weeks since GitHub announced that it was making 3+TB of its open source library available on BigQuery ...




It’s been a couple of weeks since GitHub announced that it was making 3+TB of its open source library available on BigQuery, and the Google Cloud Platform community has been busy ever since.



Google Developer Advocate Felipe Hoffa showed the world how it was done in “GitHub on BigQuery: Analyze all the open source code,” and fellow DA Fransesc Campoy followed suit with a post analyzing GitHub Go packages. Along the way, he discovers that he can create even more nuanced queries by using BigQuery User Defined Functions.



Then, one of Google’s newest DAs Guillaume Laforge informs us that there are 743,070 Groovy files on GitHub with 16,464,376 lines of code, while CloudFlare’s Filippo Valsorda (the “Heartbleed guy”) analyzes how the Go ecosystem “does vendoring.”



Meanwhile, over on Medium, Google program manager for big data and machine learning Lak Lakshmanan uses BigQuery to discover which popular Java projects need the most help by searching for tagged comments such as FIXME and TODO. The post also shows how to use Google Cloud Dataflow to build a pipeline starting from BigQuery to Java in order to process the data in steps.



Or check out Robert Kozikowski’s blog for a treasure trove of GitHub data analysis: posts on visualizing relationships between python packages; top pandas, numpy and scipy functions, emacs packages and angular directives.



And if that’s still not enough BigQuery on GitHub for you, here’s a Changelog podcast on the topic for your drive home!






Wait. That’s not Google. That’s Houston.



We do have a Mission Control at Google, named in honor of NASA’s Christopher C. Kraft Jr. Mission Control Center ...






Wait. That’s not Google. That’s Houston.



We do have a Mission Control at Google, named in honor of NASA’s Christopher C. Kraft Jr. Mission Control Center, pictured here. But at Google, Mission Control is not a place. It’s a six month rotation program for engineers working on product development to experience what it’s like to be a Site Reliability Engineer (SRE). The goal is to increase the number of engineers who understand the challenges of building and operating a high reliability service at Google's scale.



The Mission Control inspiration goes further; SREs at Google are issued jackets that bear a flight patch inspired by the one Gene Kranz had commissioned for the Mission Controllers in Houston 1. It bears the “Kranz Dictum” of “Tough and Competent” in Latin: “Duri et Periti”. If you see someone wearing a leather jacket with this flight patch, you’re looking at a Google SRE.



But what is an SRE? According to Google Vice President of Engineering Ben Treynor Sloss, who coined the term SRE, “SRE is what happens when you ask a software engineer to design an operations function.” In 2003, Ben was asked to lead Google’s existing “Production Team” which at the time consisted of seven software engineers. The team started as a software engineering team, and since Ben is also a software engineer, he continued to grow a team that he, as a software engineer, would still want to work on. Thirteen years later, Ben leads a team of roughly 2,000 SREs, and it is still a team that software engineers want to work on. About half of the engineers who do a Mission Control rotation choose to remain an SRE after their rotation is complete.



Google has been putting the word out about SRE for the past couple of years. Ben gave a talk at SREcon14 where he shared the principles of SRE learned over 11 years of building the team at Google. Melissa Binde gave a talk at GCP Next 2016 where she provided some pointers on how to apply some of the techniques we use at Google to your workloads running in our cloud. And if you really want to dig deep, the Site Reliability Engineering book is now available, and highly recommended reading.



Over the next six months, I will be on the uncomfortably exciting adventure of my own Mission Control rotation with the SRE team in Seattle that looks after Google Compute Engine. I will also be sharing some of the things I learn along the way with everyone here on this blog. So, if you want to learn more about being an SRE and how Site Reliability Engineering impacts our cloud services, keep watching this space.







1 http://genedorr.com/patches/Ground.html







Kubernetes has hit an important milestone    version 1.3   and Google Container Engine, our managed version thereof, is moving along with it ...




Kubernetes has hit an important milestone  version 1.3  and Google Container Engine, our managed version thereof, is moving along with it.



What does this latest version mean for Kubernetes users and GKE shops? Google Developer Advocate Carter Morgan takes a stab at laying it all out in this deck.



Meanwhile, the Kubernetes community is busy building out a collection of resources that show users how to use Kubernetes effectively. Just getting started? Arun Gupta of Couchbase has a tutorial to help you get started. You might also want to take a step back and read the paper on Kubernetes design patterns that Google’s Brendan Burns presented at Usenix last month.



For shops that are already all-in with Kubernetes, Google’s Kelsey Hightower presents on using Kubernetes to manage Redis, Java developer Eduard Kaiser digs deep into the Kubernetes Ingress Controller.



With this kind of momentum, it’s no surprise that the number of companies running on top of Kubernetes is starting to pile up. Check out the conversation on HackerNews about the good stuff that Kubernetes does for IT operations. Or The New Stack’s write-up about WePay, a PCI-certified credit card processing provider that has adopted containers and Kubernetes as it moves to microservices. And online gaming provider Rayark, whose smash-hit VOEZ runs almost entirely out of GKE (its Redis database runs in VMs on Google Compute Engine).



But we’re not done yet. Close your eyes and imagine a world where Kubernetes is running on Microsoft Azure. Now, open them and check out Cole Mickens' demo of Kubernetes 1.4 running on Microsoft Azure. And be sure to sign up for our upcoming GKE usability study! Why stand by idly and watch, when you can shape the future directly?





Today, we're excited to announce that Anvato is joining the Google Cloud Platform team. Anvato provides a software platform that fully automates the encoding, editing, publishing and secure distribution of video content across multiple platforms.




Today, we're excited to announce that Anvato is joining the Google Cloud Platform team. Anvato provides a software platform that fully automates the encoding, editing, publishing and secure distribution of video content across multiple platforms.



Anvato’s Media Content Platform, which counts many large media companies as customers, will complement our efforts to enable scalable media processing and workflows in the cloud.



The cloud is transforming the way video content is created and distributed to an array of connected devices, as well as the way users engage with this content. And in recent years, the adoption of over-the-top (OTT) technologies has emerged as a critical platform for delivering rich audio, video and other media via the Internet.



With OTT adoption rapidly accelerating, the Cloud Platform and Anvato teams will work together to deliver cloud solutions that help businesses in the media and entertainment industry scale their video infrastructure efforts and deliver high-quality, live video and on-demand content to consumers on any device — be it their smartphone, tablet or connected television.



The Google Cloud Platform team is committed to helping our customers in the media and entertainment industry manage their infrastructure more efficiently, provision servers and networks at rapid scale and remove unnecessary overhead. We’re thrilled to have the Anvato team join us in our mission.



We’ll have more details to share in the coming months — stay tuned!





Rayark, a mobile game developer founded in 2011, has published several award-winning games that have garnered millions of downloads. Earlier this month, it launched ...




Rayark, a mobile game developer founded in 2011, has published several award-winning games that have garnered millions of downloads. Earlier this month, it launched VOEZ, which is running in production on Google Container Engine (GKE).



Rayark chose to build VOEZ with containers because it wanted to make the game portable across clouds. In the gaming business, it's common for local game publishers to host a game on a separate infrastructure, for improved network latency and to satisfy regional go-to-market demands. Containers’ portable architecture plus Kubernetes will make it easy for Rayark to replicate VOEZ to companies who don't share its infrastructure. Containers also make more efficient use of underlying resources. Moreover, because containers’ underlying OS is already running, they can scale to handle burst demands much faster than virtual machines.



Two weeks after launch, VOEZ had already reached 2 million downloads, but for Rayark CTO Alvin Chung, that success was anything but assured. After all, one study showed that 86% of users deleted or uninstalled apps due to performance issues. Leading up to the launch, Chung wondered whether going with Container Engine was the right choice, whether the backend infrastructure would scale, whether the container technology would behave as expected, and whether Google Cloud Platform would provide the after-sales service that matched its marketing and pre-sales talk.



The answers to these questions started to unfold during beta testing, when 10,000 fans were invited to download VOEZ. The front-end HTTPS Load Balancer scaled seamlessly without any warm-up, but when Rayark conducted large-scale load testing, it started to see some potential bottlenecks. Cloud Platform’s solution architects team advised scaling Container Engine with DNS pods, sharding Redis and tracking down HTTPS connection resets that were related to Python. Once these issues were addressed, the application scaled smoothly. Front-end load balancing sees higher and higher peaks every night after 9 p.m., while backend infrastructure stays under 35% CPU utilization throughout.



Looking back, the success of the VOEZ launch was built on a number of high-level principles:




  1. Take your time to produce quality software

  2. Put together a strong developer team

  3. Explore ways to handle load even if it means using relatively new technology

  4. Think far, think big. Using open-source technology like containers ensures portability and helps avoid vendor lock-in

  5. Perform large-scale testing. Your goal is for surprises to crop up as early as possible

  6. Take advantage of on-site support and monitoring of the production infrastructure from Google partner and pre-sales teams




Hosting VOEZ on Container Engine has been a big success for Rayark. Here’s to the next two million downloads!