A new year has come and gone, and we’ve wasted no time in putting our New Year’s resolutions to work. Take a look at what the Google Cloud Platform team has been up to in the month of January.
A new year has come and gone, and we’ve wasted no time in putting our New Year’s resolutions to work. Take a look at what the Google Cloud Platform team has been up to in the month of January.




Digging into containers


Containers continue to be a hot topic across the software development universe, including here at Google. In fact, two of Google’s open source projects – Kubernetes and cAdvisor – center on containers and how they are run in clusters; and both projects were named Open Source Rookies of the Year by Black Duck Software this month.



To keep you up to speed, we launched a blog series diving into the technology and explaining the new paradigms. For a primer, start with “An introduction to containers, Kubernetes, and the trajectory of modern cloud computing.” In a nutshell, the post explains what a container provides that a VM does not. Read the piece to see why this matters, and see the relationship between single-instance containers, Docker (an open platform for distributed applications), Kubernetes (clusters of intelligently managed containers), and Google Container Engine (containers-as-a-service hosted on Google).



If you’re looking to dive deeper into the world of containers and Google’s rationale for creating Kubernetes, read “What makes a container cluster,” which talks about the ingredients of a greater container cluster manager and the benefits of running containers in large-scale clusters, and “Everything you wanted to know about Kubernetes but were too afraid to ask.”



And we weren’t just talking about containers this month. Following on the launch of the alpha version of Google Container Engine in November at Google Cloud Platform Live, we announced the beta release of Google Container Registry, a new service that’s designed to provide secure and private Docker image storage on Google Cloud Platform.




Demystifying cloud pricing


Speaking of hot topics, this month we also unpacked a topic that tops the list: pricing. Pricing is a critical consideration for users trying to make the best decision about infrastructure systems design, but it’s also complex and sometimes cloudy (pun intended). Learn what exactly you get for your money through an analysis of Google Cloud Platform pricing compared to Amazon Web Services.




Tech tips on tips: Dataflow Big Data pipelines, verify MongoDB backups, diagnose bottlenecks...


A few other tech tips and other tidbits you may have missed in the past month:







From genomics to website design: how to’s with our customers


The true benefit of this quickly evolving cloud technology really shines through in our customer stories. This month, we heard from customers spanning industries and geographies, including:




  • Alacris Theranostics, a Berlin-based spin-off of the Max Planck Institute for Molecular Genetics is using Google Cloud Platform to better match cancer patients with the most promising drug therapies.

  • Aucor, based in Finland, transitioned customer websites onto Google Cloud Platform, providing them the capacity to scale with their expanding customer base and focus on what they do best: design awesome websites.

  • Shine Technologies, a digital consultancy based in Australia, uses Google BigQuery to help businesses make sense of the billions of ad clicks, ad impressions and other data that guide business decisions.

  • Aerospike, an open-source NoSQL database based here in Mountain View, pushes the limits of Local SSD technology to offer blazing performances: fully 95% of local SSD reads complete in under 1 ms. In fact, benchmarks show that Aerospike delivers a 15x price advantage in storage costs with Local SSD compared with RAM.





New year, new series: Introducing the Learn with Google Cloud Platform Webinar Series


We’ve kicked off a monthly webinar series featuring use cases and real-time Twitter and Google+ Q&A sessions to pull back the curtain on solving complex business challenges in the cloud and nurturing business growth. This month’s webinar discussed how high-growth online retailer zulily leveraged big data to offer a uniquely tailored product and customer experience to a mass market around the clock.



It’s been an exciting month, and February promises to bring more discussion of container clusters and more tips, news and stories from the cloud. Stay tuned, and Happy Friday!



-Posted by Charlene Lee, Product Marketing Manager

In the previous weeks, Miles Ward, Google Cloud Platform’s Global Head of Solutions, kicked off the Kubernetes blog series with a post about the overarching concepts around containers, Docker, and Kubernetes, and Joe Beda, Senior Staff Engineer and Kubernetes co-founder, articulated the key components of a ...
In the previous weeks, Miles Ward, Google Cloud Platform’s Global Head of Solutions, kicked off the Kubernetes blog series with a post about the overarching concepts around containers, Docker, and Kubernetes, and Joe Beda, Senior Staff Engineer and Kubernetes co-founder, articulated the key components of a container cluster management tool based on Google’s ten years experience in running its entire business on containers. This week, Martin Buhr, Product Manager for the Kubernetes open source project, answers many of your burning questions about Kubernetes and our support for containers on Google Cloud Platform.




Everything you wanted to know about Kubernetes but were afraid to ask


When we announced the Kubernetes open source project in June of 2014, we were thrilled with the large community of customers and partners it quickly created. Red Hat, VMware, CoreOS, and others are helping to grow and mature Kubernetes at a remarkable pace. There is also a growing community of users who are both utilizing Kubernetes to manage their container clusters, but in many cases are also contributing to the project itself.



I’ve been fortunate to be able to engage with many in our community, and we consistently hear many of the same questions:




  • Given that Google already has its own mature, robust cluster management systems (which handle around two billion new containers a week), why did you create Kubernetes?

  • How does Kubernetes relate to Docker? How does it differ from Docker Swarm?

  • What insures that Google is committed to the Kubernetes open source project over the long run?

  • How does Kubernetes fit in with and augment your overarching strategy for Google Cloud Platform?

  • What incentive does Google have to make Kubernetes great outside of Google Cloud Platform for deployment on premise or on other public clouds?

  • What is the relationship between Kubernetes and Google Container Engine, now and in the future?




This post will answer these questions, and we’d love to field others we may have missed via the Kubernetes G+ page.




Why Kubernetes?


Given that Google already has its own mature, robust cluster management systems, many wonder why we created Kubernetes. There are actually two reasons for this.



First, there is the altruistic motive. We have enjoyed amazing benefits by moving to the model embodied by Kubernetes over the past ten years. It enabled us to dramatically scale developer productivity and the number of services we were able to offer without investing in a corresponding increase in operational overhead. It also gave us fantastic workload portability, enabling us to quickly “drain” applications from one resource pool and move to another. As with many other technologies and concepts that we’ve shared with the community over the years, we think Kubernetes will help make the world a better place and help others enjoy similar benefits. Other examples include Android, Chromium, and many of the technologies that underpin the rising popularity of Linux containers (including memcg, the Go programming language in which Docker is written, cgroups, and cadvisor).



Second, there is the practical reason grounded in our desire to make Google Cloud Platform the best platform on the web for customers to build and host their applications. As Urs Hölzle, Senior Vice President for Technical Infrastructure at Google noted last March, we’re unifying Google’s core infrastructure and Google Cloud Platform and see a significant business opportunity for Google in Google Cloud Platform. By enabling customers to start using the same patterns and best practices Google has developed for its own container based workloads, we make it easy for customers to move those workloads around to where they make the most sense based on factors like latency, cost, and adjacent services. We think over time that our deep, comprehensive support for containers on Google Cloud Platform will create a gravity well in the market for container based apps and that a significant percentage of them will end up with us.




How does Kubernetes relate to Docker? How does it differ from Docker Swarm?


When referring to “Docker,” we’re specifically talking about using the Docker container image format and Docker Engine to run Docker images (as opposed to Docker Inc., the company that has popularized these concepts). These Docker containers are then managed by Kubernetes.



Imagine individual Docker containers as packing boxes. The boxes that need to stay together because they need to go to the same location or have an affinity to each other are loaded into shipping containers. In this analogy, the packing boxes are Docker containers, and the shipping containers are Kubernetes pods.







Ultimately, all these pods make up your application.



You don’t want this ship adrift on the stormy seas of the Internet. Kubernetes acts as ship captain – adeptly steering the ship along a smooth path, and insuring that the applications under its supervision are effectively managed and stay healthy.



Once you move beyond working with a handful of containers, and especially when your application grows beyond more than one physical host, we strongly advise that you use Kubernetes (for reasons we’ve highlighted recently).



In terms of how Kubernetes differs from other container management systems out there, such as Swarm, Kubernetes is the third iteration of cluster managers that Google has developed. It incorporates the cumulative learnings of over a decade of experience in production container management. It embodies the cluster centric model, which we’ve found works best for developing, deploying, and managing container based applications. Swarm and similar systems embody the single node model and may work well for some use cases, but there are several critical architectural patterns missing that customers will ultimately need as they move to production use cases (these were highlighted in Joe’s post last week).




Is Google committed to Kubernetes?


Both customers and partners are asking variations of the following question: “Given that I’m considering betting the future of my project/app/business on the long term viability of Kubernetes, what assurance do I have that Google will not lose interest over the long term, causing the project to whither?”



First, as outlined above, we view Kubernetes as core to our cloud strategy, and we’re internally committed to making Google Cloud Platform a significant part of Google’s overall business. Our deep experience in running containerized workloads is a big competitive advantage for Google Cloud Platform, so it makes sense for us us to continue to invest in making Kubernetes robust and mature. As an expression of this, we have some of our most experienced engineering talent working on the project, including Googlers with years of experience developing and refining our internal cluster management systems and processes.



Second, we’ve been very fortunate to have a vibrant, experienced community of contributors form around Kubernetes. Many of them have incorporated Kubernetes into their own products, resulting in a vested interest in the health and sustainability of Kubernetes. For example, Red Hat made Kubernetes an integral part of OpenShift version 3, and as of the time of this post, two of the top ten contributors are from the growing team Red Hat has working on Kubernetes. Thus, even if Google were to get taken out by a meteorite, a significant community of contributors would remain to carry it forward.




How does Kubernetes fit into Google’s cloud strategy?


As we mentioned, Google Cloud Platform is a key business for Google, and we are confident (based on ten years of experience using containers to run our business and the significant technical and operational depth we’ve acquired in doing so) that we can make Google Cloud Platform the best place on the web for containers. Kubernetes embodies the best practices and patterns based on this hard won experience for creating and running container based workloads.



We think that Kubernetes will help developers create better container based applications that require less operational overhead to run, thereby accelerating the trend toward container adoption. Given the inherent portability of container based applications managed by Kubernetes, every new one created is another candidate to run on Google Cloud Platform.



Our hope is that container based apps will be made even more awesome through the use of Kubernetes (regardless of where they reside), and our goal is to ensure that Kubernetes based apps will be exceptionally awesome on Google Cloud Platform. How much of the market moves to containers and how much of this load we’re able to attract to Google Cloud Platform remains to be seen, but we’ve placed our bets on wide-scale adoption.




Kubernetes on other clouds? On-premise?


For our strategy to be successful, we need Kubernetes to be awesome everywhere, even for customers who will run their apps on other clouds or in their own datacenters. Thus, our goal for Kubernetes is ubiquity. Wherever you run your container based app, our hope is that you do so using Kubernetes so that you can benefit from all the things Google has gotten right over the years (as well as the numerous lessons we’ve learned from the things we got wrong). Even if you never plan on moving beyond your own datacenters, or plan on sticking with your current cloud provider exclusively into the foreseeable future1, we would still love to talk to you about why Kubernetes makes sense as a foundational piece of your container strategy.




Kubernetes and Google Container Engine?


This brings us to Google Container Engine, our managed container hosting offering and the embodiment of Kubernetes on Google Cloud Platform. We want everyone to use Kubernetes based on its own merits and develop container based apps based on proven patterns battle tested at Google. In parallel, we’re making Google Cloud Platform a fantastic place to develop and run container based applications, giving customers the benefits of not only Google’s experience in operating and maintaining container clusters, but also of all the adjacent services on Google Cloud Platform. At present, Google Container Engine is simply hosted Kubernetes, but look for us to start introducing features and linkages to other Google Cloud Platform services to further enhance its utility.




We're Stoked!


It’s an exciting time to be an application developer! As you’ve seen above, Google is deeply committed to Kubernetes, and we and our ecosystem of contributors are working hard to make sure it’s the best tool for creating and managing container clusters regardless of where these clusters run. From our perspective, the first and best option is that you run your container based apps on Google Container Engine, second best is that you run them on Google Compute Engine using Kubernetes, and third best is that you run them someplace else using Kubernetes.



The thing that most excites me about Kubernetes is the frequency at which I see customers rolling up their sleeves and contributing to the project itself. While I’m very proud of what our extended team has created in Kubernetes, I think Joe Beda said it best in his most recent blog post:




While we have a lot of experience in this space, Google doesn't have all the answers.                     There are requirements and considerations that we don't see internally. With that in mind,                   please check out what we are building and get involved!



Try it out, file bug reports, ask for help or send a pull request (PR).



-Posted by Martin Buhr, Product Manager, Kubernetes







1 The theories of supply chain diversification and vendor risk management both recommend against relying on a single supplier for any critical component of one’s business or infrastructure. This has been borne out by the experience of numerous customers over the years with large vendors of proprietary IT systems and software. Part of the appeal of Docker and Kubernetes is the degree to which they significantly lower the friction involved in moving applications between various resource pools (laptop to server, server to server, data center to data center, cloud to cloud, etc.).




(Cross-posted on the Google for Work Blog)



Many businesses around the world rely on VMware datacenter virtualization solutions to virtualize their infrastructure and optimize the agility and efficiency of their data centers. Today we’re excited to announce that we are teaming up with VMware to make select Google Cloud Platform services available to VMware customers via vCloud Air, VMware’s hybrid cloud platform. We know how valuable flexibility is to a business when determining its total infrastructure solution, and with ...
(Cross-posted on the Google for Work Blog)



Many businesses around the world rely on VMware datacenter virtualization solutions to virtualize their infrastructure and optimize the agility and efficiency of their data centers. Today we’re excited to announce that we are teaming up with VMware to make select Google Cloud Platform services available to VMware customers via vCloud Air, VMware’s hybrid cloud platform. We know how valuable flexibility is to a business when determining its total infrastructure solution, and with today’s announcement, enterprise businesses leveraging VMware’s datacenter virtualization solutions gain the flexibility to easily integrate Google Cloud Platform.



Businesses can now use Google Cloud Platform tools and services – including Google BigQuery and Google Cloud Storage – to increase scale, productivity, and functionality. VMware customers will benefit from the security, scalability, and price performance of Google’s public cloud, built on the same infrastructure that allows Google to return billions of search results in milliseconds, serve 6 billion hours of YouTube video per month and provide storage for 425 million Gmail users.



With Google BigQuery, Google Cloud Datastore, Google Cloud Storage, and Google Cloud DNS directly available via VMware vCloud Air, VMware customers will benefit from a single point of purchase and support for both vCloud Air and Google Cloud Platform:




  • vCloud Air customers will have access to Google Cloud Platform under their existing service contract and existing network interconnect with vCloud Air, and will simply pay for the Google Cloud Platform services they consume.

  • Google Cloud Platform services will be available under the VMware vCloud Air terms of service, and will be fully supported by VMware’s Global Support and Services (GSS) team.

  • Certain Google Cloud Platform services are also fully covered by VMware’s Business Associate Agreement (BAA) for US customers who require HIPAA-compliant cloud service.




Google Cloud Platform services will be available to VMware customers beginning later this year, so we’ll have more information very soon. In the near future, VMware is also exploring extended support for Google Cloud Platform as part of its vRealize Cloud Management Suite, a management tool for hybrid clouds.



Today’s announcement bolsters our joint value proposition to customers and builds on our strong, existing relationship around Chromebooks and VMware View and also around the recently announced Kubernetes open-source project. We look forward to welcoming VMware customers to Google Cloud Platform.



-Posted by Murali Sitaram, Managing Director, Global Partner Strategy & Alliances, Google for Work

Today’s guest blog comes from Graham Polley, Senior Consultant for Shine Technologies, a digital consultancy in Melbourne, Australia. Shine builds custom enterprise software for companies in many industries, including online retailers, telecom providers, and energy businesses. ...
Today’s guest blog comes from Graham Polley, Senior Consultant for Shine Technologies, a digital consultancy in Melbourne, Australia. Shine builds custom enterprise software for companies in many industries, including online retailers, telecom providers, and energy businesses.



Wrestling with large data sets reminds me of that memorable line from Jaws when police chief Brody sees the enormous great white shark for the first time: “You’re gonna need a bigger boat”. That line pops into my head whenever we have a new project at Shine Technologies that involves processing and reporting on massive amounts of client data. Where do we get that ‘bigger boat’ we need to help businesses make sense of the billions of ad clicks, ad impressions, and other data that can guide business decisions?



Four or five years ago, without any kind of ‘bigger boat’ available, we simply couldn’t grind through terabytes of data without plenty of expensive hardware, and a lot of time. We’d have to provision new servers, which could take weeks or even months, not to mention costs for licensing and system administration. We could rarely analyze all the data at hand because it would overwhelm network resources and we’d end up usually trying to analyze just 10% or 20%, which didn’t give us complete answers to client questions or provide any discernible insights.







When one of our biggest clients, a national telecommunications provider in Australia, needed to analyze a large amount of their business data in real time, we chose Google’s DoubleClick for Publishers product. We realized we could configure DoubleClick to store the data in Google Cloud Storage, and then point Google BigQuery to those files for analysis, with just a couple of clicks.



Finally, we thought, we’ve found something that can scale effortlessly, keep costs down, and (most importantly) allow us to analyze all of our client’s data as opposed to only small chunks of it. BigQuery boasts impressive speeds, is easy to use, and comes with a very short learning curve. We don’t need to provision any hardware, or spin up complex Hadoop clusters, and it comes with a really nice SQL-like interface that even makes it possible for non-techy people, such as Business Analysts, to easily interrogate and draw insights from the data.





When the same client came to us with a particularly complex problem, we immediately knew that BigQuery had our backs. They wanted us to stream millions of ad impressions from their large portfolio of websites into a database, and generate analytics about that data using some visually compelling charts - in real-time. Using its streaming functionality, we started to pump the data into BigQuery, which went off without a hitch, and we sat back and watched as millions of rows started flowing into BigQuery. When it came to interrogating and analysing the data, we experienced consistent results in the 20-25 second range for grinding through our massive data set of 2 billion rows using relatively complex queries to aggregate the data.



By leveraging the streaming capability of BigQuery, it allows us to analyze our client’s data instantly, and empowers them with ‘real-time insights’, rather than waiting for slower batch jobs to complete. The client can now instantly see how ad campaigns are performing, and change the ad creative or target audience on the fly in order to achieve better results.



Simply put, without BigQuery it just would not have been possible to pull this off. This is bleeding edge technology that we are using and the idea of doing something similar in the past with a relational database management system (RDBMS) was simply inconceivable.



The success of this project opened up a lot of doors for us. After we blogged about it, we received several requests from prospective clients wanting to know if we could apply the same technology to their own big data projects, and Google invited us to become a Google for Work Services partner. Our clients are continuously coming up with more ideas for driving insights from their data, and by using BigQuery we can easily keep up with them.



Big data can seem like that great white shark in Jaws - unmanageable and wild unless you have the right tools at your disposal to tame it. BigQuery has become our go-to solution for reeling in data, processing it, and discovering the value within.



Contributed by Graham Polley, Senior Consultant, Shine Technologies



Learn more about Shine Technologies and the business impact of BigQuery. Watch as BigQuery takes on Shine Technologies' 30 Billion Row, 30 Terabyte Challenge.







Part 1 - Virtual Compute




When designing infrastructure systems, whether creating new applications or deploying existing software, it’s crucial to manage cost. Costs come from a variety of sources, and every approach to delivering infrastructure has its own tradeoffs and complexities. Cloud infrastructure systems create a whole new range of variables in these complex equations.


Part 1 - Virtual Compute




When designing infrastructure systems, whether creating new applications or deploying existing software, it’s crucial to manage cost. Costs come from a variety of sources, and every approach to delivering infrastructure has its own tradeoffs and complexities. Cloud infrastructure systems create a whole new range of variables in these complex equations.



In addition, no two clouds are the same! Some bundle components while others offer more granular purchasing. Some bill in different time increments, and many offer a variety of payment structures, each with differing economic ramifications. How do you figure out what each costs and make a choice?



To help you work this through, we’ve created an example for you. For this example, let's look at a fairly common scenario, a mobile application with its backend in the cloud. This application shares pictures in some way, and has about 5 million active monthly users. Let’s go through what instance types this application will need to meet that user-driven workload and then price out what that will cost in an average month on Google Cloud Platform and compare against Amazon Web Services.



Our example application has 4 components:




  • An API frontend that mobile devices will contact for requests and actions. This portion will consume the majority of the compute cycles.

  • A static marketing and blog front end.

  • An application layer that will process and store images as they come in or are accessed.

  • And on the back end, a Cassandra cluster to store operational metadata.




For capacity planning, we have scoped as follows:




  • The API frontend instances can respond to roughly 80 requests per second. We expect about 350 requests per second given this number of users. Therefore we should only need four regular instances for this layer.

  • The marketing front end shouldn’t need more than two instances for redundancy.

  • The application layer will need four instances for image processing and storage control.

  • The Cassandra cluster will need five instances with a higher memory footprint. Let’s assume for now that the workload is entirely static, and autoscaling isn’t being used (oh don’t worry, we’ll add that and more back in later).




In Figure 1, you can see our example application logical architecture looks like this:



To explain the nuances of cloud pricing, let’s use Google Cloud Platform and Amazon Web Services as the example cloud infrastructure providers, and start at the most simple, on-demand model. We can use calculators that each provider offers to find out correct pricing quickly:



Please note that we completed these calculations on January 12, 2015, and have included the output prices in this post. Any discrepancies are likely due to pricing or calculator changes following the publishing of this post.



Here is the output of the pricing calculators:



Google Cloud Platform estimate:

Monthly: $2610.90



Amazon Web Services estimate:

Monthly: $4201.68



It’s important to note that right away things don’t look equivalent, with Google’s pricing being 38% lower. Why? Google includes an automatic discount called Sustained Usage Discount, which reduces the cost of long-running instances. Since we didn’t autoscale or otherwise vary our system over the course of the month, the full 30% discount applies. Even without that, pricing before the discount comes in at $3729.86, or an 11% discount off Amazon’s on-demand rates. Over the course of a year, going with Google would save you just over $19,000!




Reserved Instances


Amazon Web Services has an alternate payment model, where you can make a commitment to run infrastructure for a longer period of time (either 1 or 3 years), and opt to pay some portion of the costs up front, which they call Reserved Instances. Here are the costs for our example app with Amazon’s Reserved Instance pricing:



Amazon Web Services, no-upfront, 1 year estimate:

Monthly: $2993.00



Over a one-year term with Amazon, if you commit to pay for the instance for that entire period, and you opt for the “no-upfront” option, you still end up with a 13% higher cost than making no commitment to Google.



Amazon Web Services, partial upfront, 1 year estimate:

Upfront: $18164.00

Monthly: $1093.54

Effective monthly: $2607.21



If you opt to pay over $18k up front using the “partial upfront” model, you arrive at a lower price, saving $44 dollars (not thousands) over the course of the year



Amazon Web Services, all upfront, 1 year estimate:

Upfront: $30,649.00

Monthly: $0.00

Effective monthly: $2554.08



If you choose instead to pay 100% of the yearly cost up front, you’d end up saving $681.78 over the course of the year versus Google Cloud Platform, or 2.3%. As you can see, however, the upfront payment is over $30,000!



Similarly, Amazon offers three-year options for the partial upfront and all upfront models:



Partial upfront, 3 year estimate:

Upfront: $27,585.00

Monthly: $897.90

Effective monthly: $1664.15



All upfront, 3 year estimate:

Upfront: $56,303.00

Monthly: $0.00

Effective monthly: $1563.97



If you’re willing to part with just over $56,000 for the three-year, all upfront Reserved Instance, you’d receive a 40% discount off of Google’s rate, for a total projected gap of over $37k.



However, as I’m sure you can surmise, there are several risks that a significant up front commitment and payment create. The bottom line –- you’re locked in to a long-term pricing contract, and you risk missing out on substantial savings. Lets look at why:


  1. Infrastructure prices will drop, either for Google (which has happened 3 times in the last 12 months, as we've reintroduced Moore’s law to the cloud), or for Amazon (which has happened 2 times in the last 12 months). For 2014, this worked out to an average of a 4.85% price reduction per month on Google Cloud Platform. Due to on-demand pricing, any reduction in prices is something you automatically receive on GCP.

  2. Also, don’t forget, capital is expensive! Most businesses pay a ~7% per year cost of capital, which reduces the value of these up-front purchases significantly. For this example, that adds an effective $11,823.63 to the 3-year all up-front Reserved Instance price from Amazon.




So, let’s revisit that $37,689.40 gap. By adding in the cost of capital, and subtracting likely instance price reductions, even at the most aggressive discount AWS offers, AWS costs $60,244.21 and Google Cloud Platform costs $57,959.57, which equates to a 3.9% cost advantage.



By combining conservative evaluations of the basic facts of public cloud pricing dynamics (3% per month price reductions, 7% cost of capital) even 3-year all-upfront RI’s from AWS are not cost efficient compared to on-demand Sustained Use Discounts from Google Cloud Platform.






Flexibility


There are also cost risks to this structure presented by commitment to specific usage choices.




  1. New instance types might make your old choices inefficient (c3 instances from AWS are substantially more cost efficient for some workloads than older m3 instances, for example).

  2. Your software might change. For example, what if you improve the efficiency of your software to reduce your infrastructure requirements by 50%? Or what if you re-platform from Windows to Linux? (Reserved Instances require a commit on OS type) Or what if your memory needs to grow, and instances need to switch from standard to high-memory variants?

  3. Your needs might change. For example, what if a new competitor arrives who takes ½ of your customers, which reduces the load on your infrastructure by 50%?

  4. What if you picked everything right but the geography, and your app is suddenly popular in Asia or Europe?




The “on-demand” agility and flexibility of cloud computing is supposed to be a huge financial benefit, especially when your requirements change. Let’s imagine in the second month, several of those risks above actually happen: you move to the Asian market, resize a few instances to better map to actual workload, and shrink a bit on the cassandra cluster redundancy due to how reliable instances with live-migration are. That would look something like Figure 2.



Google Compute Engine estimate:

Monthly: $909.72



Amazon Web Services Partial upfront, 1 year, estimate:

Upfront: $6350.00

Monthly: $331.42

Effective monthly: $860.59



This system costs less than ½ of what the original system costs, and is on an entirely different continent, but what does it cost to change your plan? This change costs very little at Google: you don’t pay any direct penalty for changing your infrastructure design. Your only costs would be how long the two different systems are up and running simultaneously to facilitate a zero-downtime cut-over.



In stark contrast, the cost for changing the Amazon system are essentially the total loss of whatever committed funds you applied to earn the discount, plus, the new requirement for upfront funds to get an efficient price (and re-commit!) in your new configuration, on top of the above-mentioned dual system usage (which costs more per hour...)



Let’s look at this from a cash flow perspective, not even in the worst case, but just assuming that you wanted to break-even with Google pricing on Amazon and chose the partial up front one-year Reserved Instance.



Google: Month 1 usage: $2610.90 + Month 2-13 usage: $909.72 x 12 = $13,527.54



Amazon: Month 1 Commit: $18,164.00 + Month 1 usage: $1093.54 + Month 2 commit: $6350.00 + Month 2-12 usage: 331.42 *12 = $29,584.58



That’s a big gap, even without figuring in the cost of capital! You can see how risky those commitments can be. AWS has a service to mitigate some of that risk, a RI marketplace, which allows you to attempt to sell back Reserved Instance units to other AWS customers. However, as I’m sure you can imagine, this is another process that presents a few risks:


  1. Are the RI’s you’re selling, for instance, types that are now clearly inefficient for many workloads and therefore not desirable to other customers?

  2. Will your RI’s sell for full price, or some discount to encourage a sale?

  3. How many buyers are there in the marketplace, and how quick will your RI’s sell, if at all?

  4. What if you didn’t start out in the US? The RI Marketplace is only available for customers with a US bank account.


One risk that's a guaranteed loss: every sale on the RI marketplace comes with a 12% fee, payable to Amazon. Let’s say you have great luck and are able to sell 10 months of your original 12-month RI (they have to be sold in whole-month increments, rounding down), at full original price, which nets you back $13,320.27 after fees. Now your 13-month total is $16,083.19, so you’ve only lost $2,555.65 compared to what you would have paid using Google. But what a hassle, and how much risk did you take on? What if the RI’s didn’t sell for a few months? Every month, you lose $1,332. Ouch!




Automatic Scaling


But this is a backwards example you say, cloud isn’t intended for this kind of static sizing, you’re supposed to be autoscaling to tightly follow load. True! So, let’s imagine that the above reflects the requirements of our steady-state load, and we have four small peaks during the day: morning rush, lunch peak, after-work, and midnight madness, each of which pop at 10x the above workload. (Our application passes the toothbrush test!) Our backend handles these spikes fine, but our web and API tiers need to autoscale dramatically. Let’s say each of these peaks onset very rapidly, say over the course of five minutes, and last for 15 minutes each. Note, we see systems that spike at 100x or more, so this scenario isn’t extreme!



This kind of system is pretty easy to build efficiently on Google. Instances take roughly a minute to launch, so we can easily autoscale to accommodate load, and since we charge only a minimum of 10 minutes and bill in per-minute increments, this only adds $110.77 a month to our bill. 10x peaks!



Google Compute Engine estimate:

Monthly additional: $110.77



Building this on AWS is just not as efficient. Because instances take >5 minutes on average to launch, we need to pre-trigger our instance boots (read, timing logic or manual maintenance). Also, AWS bills for instances in full hour increments, so we pay for 60 minutes when we only use ~20, for each of our 4 peaks. This makes the total additional cost $341.60, and without any ability to appropriately discount via reserved instances, that’s a number an AWS customer can’t bring down today.



Amazon Web Services estimate:

Monthly additional: $341.60

            + instance launch management logic manual ops or development



While this spike example is one utilization behavior we see frequently, we also see basic diurnal (twice daily, aka day/night) variability on almost every customer-facing service of anywhere from 2x-5x utilization. If that natural variation isn’t being followed by use of Autoscaler or other automated resource management, you are definitely leaving money on the table!




Summary


While there are many more dimensions to evaluate, hopefully this is a helpful analysis of how pricing systems differ between Google and Amazon. We’re not stopping here; look forward to more comparisons with more cloud providers and more workloads to help you understand exactly what you get for your money.



We are hyper-focused on driving cost out of cloud services, and leading the way with innovations such as Sustained Usage Discounts and per-minute billing. As one of our customers, StarMaker Interactive VP of Engineering Christian F. Howes said, “App Engine's minute-by-minute scaling and billing saves us as much as $3,000 USD per month.”



We think pricing considerations are critical for users trying to make the best decision they can about infrastructure systems design. I’d love to hear your thoughts and what matters to you in cloud pricing? What areas are confusing, hard to analyze, hard to predict? What ideas do you have? Reach out!



-Posted by Miles Ward, Global Head of Solutions, Google Cloud Platform

Interested in cloud computing with containers? Join us for an evening with the experts on Kubernetes, the open source container cluster orchestration platform. There will be talks, demos, a panel discussion, and refreshments sponsored by Intel.
Interested in cloud computing with containers? Join us for an evening with the experts on Kubernetes, the open source container cluster orchestration platform. There will be talks, demos, a panel discussion, and refreshments sponsored by Intel.



Many contributors to Kubernetes will be attending, including Google, Red Hat, CoreOS, and others.



Time: 6:00PM-10:00PM PST

Location: San Francisco, CA



Detailed agenda coming soon. Register here.

Today, Black Duck Software announced their annual Open Source Rookie of the Year awards. We’re very excited that two of our open source projects, Kubernetes and cAdvisor, were amongst those selected! The award recognizes the top new open source projects of the past year. Both projects center on containers and how they’re run in clusters. Kubernetes is a container cluster manager and cAdvisor analyzes the performance of running containers. Read on to learn more about these projects.
Today, Black Duck Software announced their annual Open Source Rookie of the Year awards. We’re very excited that two of our open source projects, Kubernetes and cAdvisor, were amongst those selected! The award recognizes the top new open source projects of the past year. Both projects center on containers and how they’re run in clusters. Kubernetes is a container cluster manager and cAdvisor analyzes the performance of running containers. Read on to learn more about these projects.







Kubernetes

Developers want to focus on writing code, and IT operations want to focus on running applications efficiently. Using Docker containers helps to define the boundaries and improve portability. Kubernetes takes that one step further and lets users deploy, manage, and orchestrate a container cluster as a single system.



Kubernetes is designed to be portable across any infrastructure, which allows application owners to deploy on laptops, servers, or cloud, including Google Cloud Platform, Amazon Web Service and Microsoft Azure.



It lets you break applications down into small sets of containers that can be reused. It then schedules these containers onto machines and actively manages them. These can be logically grouped to make it even easier for users to manage and discover them. Kubernetes is lightweight, portable, and extensible. You can start running your own clusters today.








Kubernetes started about a year ago as a small group of Googlers who wanted to bring our internal cluster management concepts to the open source containers ecosystem. Drawing from from Google’s 10+ years of experience running container clusters at massive scale, the group developed the first few prototypes of Kubernetes. Six months, and lots of work later, the first version of Kubernetes was released as an open source project. We were all humbled and excited to see the overwhelming positive response the project received. Although it started as a Google project, it quickly gained owners from RedHat, Core OS, and many many contributors. In November, we announced Google Container Engine, which offers a hosted Kubernetes cluster running in the Google Cloud Platform. This makes it even easier to run Kubernetes by letting us manage the cluster for you.



What’s next for Kubernetes? The team and community is furiously working towards version 1.0, the first production-ready release. Expect to see a slew of improvements in user experience, reliability, and integration with other open source tools.









cAdvisor

cAdvisor analyzes the resource usage and performance characteristics of running containers. It aims to give users and automated systems a deep understanding of how their containers are performing. The information it gathers is exposed via a live-updating UI (see a screenshot below) and through an API for processing by systems like InfluxDB and Google’s BigQuery. cAdvisor was released alongside Kubernetes back in June and has since become a defacto standard for monitoring Docker containers. Today, it’s run on all Kubernetes clusters and can monitor any type of Linux container. cAdvisor has even become one of the most downloaded images on the Docker Hub.



Below is a screenshot of part of the cAdvisor UI showing the live-updating resource usage of a container. The screenshot shows total CPU and memory consumption over time as well as the instantaneous breakdown of memory usage.






Continuously updating view of a container's resource usage





The cAdvisor team is working to make it even easier to understand your running containers by surfacing events that let you know that your containers are not getting enough resources. Alongside these, come suggestions on actions you can take to remedy the problem. Events and suggestions can be integrated into systems like Kubernetes to allow for auto-scaling, resizing, overcommitment, and quality of service guarantees for containers.



We’re extremely grateful to the open source community for embracing both of these projects so widely. Our aim was to address a need we saw in the open source containers community and start a dialogue around containers and how they should be run. And as we continue to collaborate with the open source community, we look forward to evolving these projects. We invite you to join us in making Kubernetes and cAdvisor better! Try them out, open issues, send patches, and start discussions. Happy hacking!



-Posted by Greg DeMichillie, Director of Product Management

Aucor, based in Finland, designs WordPress and Drupal websites for clients. When their growing customer base needed more capacity than their private servers could manage, the company knew they needed to lighten the weight by moving to the cloud.
Aucor, based in Finland, designs WordPress and Drupal websites for clients. When their growing customer base needed more capacity than their private servers could manage, the company knew they needed to lighten the weight by moving to the cloud.



Aucor turned to Google Cloud Platform so they could keep their focus on what they do best – designing fantastic websites – not managing servers.



The team took Google App Engine out for a test drive. Janne Jääskeläinen, CEO at Aucor, noted, “Our test site could handle over 70,000 requests per second without the users noticing a thing. Let’s put that into perspective: it’s as if every single Finn (about 5.4 million people) would have spent a good hour clicking around the site, without it crashing or even slowing down.”



With these speeds, the team was able to easily transition over 70 of its sites to Google App Engine in little time. Learn more about Aucor’s story here.



-Posted by Kelly Rice, Product Marketing Manager


In 2015, we're introducing a monthly webinar series to take an in-depth look at diverse elements that help us solve complex business challenges in the cloud and nurture business growth. We’ll cover unique IT management and implementation strategies and the people, tools, and applications that increase impact. We're opening it up to a live online and global forum with the aim to foster collaborative learning through use cases we can all relate to and real-time Q/A sessions. Our first webinar features, zulily, a high-growth online retailer that leverages big data to provide a uniquely tailored product and customer experience to a mass market around the clock. ...
In 2015, we're introducing a monthly webinar series to take an in-depth look at diverse elements that help us solve complex business challenges in the cloud and nurture business growth. We’ll cover unique IT management and implementation strategies and the people, tools, and applications that increase impact. We're opening it up to a live online and global forum with the aim to foster collaborative learning through use cases we can all relate to and real-time Q/A sessions. Our first webinar features, zulily, a high-growth online retailer that leverages big data to provide a uniquely tailored product and customer experience to a mass market around the clock.



Zulily is one of the largest e-commerce companies in the United States. Its business is retail, but its DNA is in technology, using data and predictive analytics to drive decisions. As the company grows, so does the amount and complexity of data. Zulily’s IT realized that in order to keep up and properly scale, they had to redesign the way they process, analyze and use big data.



Zulily transitioned to the Google Cloud Platform to meet these challenges and ultimately use the big data it collected to improve online customer experience. Join us as we take a technical deep dive into zulily’s new application infrastructure built on the Google Cloud Platform. The team will share key learnings and discuss how they plan to scale their efforts and impact.



Big data experts from Google Cloud Platform and zulily will share:




  • Best practices and implementation strategies to drive value from big data using products such as Google BigQuery and Hadoop

  • How zulily uses Google Cloud Platform to improve customer experience, increase sales, and increase relevance via marketing initiatives

  • Key leadership and technical benefits and risks to be aware of as you plan, execute and optimize your big data implementation strategy across one or multiple business units




Live Webinar: zulily turns big data into a big advantage with Google Cloud Platform




  • Wednesday, January 28, 2015

  • 10:30 - 11:00 a.m. PT

  • Speakers: William Vambenepe, Lead Product Manager for Google Cloud Big Data Services and Sudhir Hasbe, Director Software Engineering for Data Services, BI and Big Data Analytics for zulily




View the recording here.

Last week, we kicked off our series to introduce container technologies, which are changing the way that people deploy and manage applications. Docker has emerged as a popular technology for application containerization, revolutionizing how applications are built, deployed and managed. Google Cloud Platform offers ...
Last week, we kicked off our series to introduce container technologies, which are changing the way that people deploy and manage applications. Docker has emerged as a popular technology for application containerization, revolutionizing how applications are built, deployed and managed. Google Cloud Platform offers rich support for Docker containers through the fully managed Google Container Engine service powered by Kubernetes, container optimized VMs on Google Compute Engine, and Managed VMs for Google App Engine.



Today we are announcing the beta release of a new service: Google Container Registry for the secure hosting, sharing, and management of private container repositories.



The registry service provides three key benefits to Google Cloud Platform customers:




  • Access control: The registry service hosts your private images in Google Cloud Storage under your Google Cloud Platform project. This ensures by default that your private images can only be accessed by members of your project, enabling them to securely push and pull images through the Google Cloud SDK command line. Container host VMs can then access secured images without additional effort.

  • Server-side encryption: Your private images are automatically encrypted before they are written to disk.

  • Fast and reliable deployment: Your private images are stored in Google Cloud Storage and cached in our datacenters, ready to be deployed to Google Container Engine clusters or Google Compute Engine container optimized VMs over Google Cloud Platform’s Andromeda based network fabric.




zulily, an online retailer that offers thousands of new and unique products each day, was an early adopter of the registry service. “Docker registry availability, security, performance, and durability become more and more critical as more of our Compute Engine applications are containerized with Docker. Private registries help, but they need valid certificates, authentication and firewalls, backups, and monitoring. Google's container registry provides us with a complete Docker registry that we integrate into our development and deployment workflow with little effort," said Steve Reed, Principal Engineer, Core Engineering at zulily.



During the Container Registry beta, there is no extra cost for using the registry service besides the Google Cloud Storage charges for storage and network resources consumed by your private images.



To get started, you will need a Google Cloud Platform project with billing enabled. If you don’t have one already, you can use the free trial to create one. You will also need to install Docker and Google Cloud SDK.



Go ahead, take a look at our documentation and start using the registry for managing your private Docker images. The registry service team looks forward to receiving your direct feedback.



-Posted by Pratul Dublish, Technical Program Manager

Google Cloud Platform provides a reliable and scalable compute, storage and network infrastructure for all your big data needs. We have worked extensively with the Open Source community to optimize the Hadoop ecosystem onto Cloud Platform. In 2014, we helped simplify the deployment of Apache Hadoop and Apache Spark on Google Cloud Platform by introducing ...
Google Cloud Platform provides a reliable and scalable compute, storage and network infrastructure for all your big data needs. We have worked extensively with the Open Source community to optimize the Hadoop ecosystem onto Cloud Platform. In 2014, we helped simplify the deployment of Apache Hadoop and Apache Spark on Google Cloud Platform by introducing bdutil, a Command Line toolset to accelerate deployment. To reduce cluster startup time, increase interoperability, and streamline storage of the source data and subsequent results, we have also provided connectors to Google Cloud Storage and Google BigQuery.



Today, we’re happy to announce the availability of Hortonworks Data Platform, HDP 2.2, on Google Cloud Platform. HDP 2.2. has been certified by Hortonworks for use on Google Cloud Platform, along with usage of bdutil deployment toolset and the Google Cloud Storage connector. Google and Hortonworks believe in providing a seamless experience for starting and running your hadoop tasks on the cloud. We want users to be focused on developing and analyzing their data, rather than worrying about bringing up Hadoop clusters.



You can take advantage of integrated and certified HDP plugin with bdutil and start deployment of standard clusters in a matter of minutes, with the following command line:



./bdutil deploy -e platforms/hdp/ambari_env.sh


By default, bdutil will deploy a cluster with 5 nodes, per HDP recommendations, along with the latest version of HDP and recommended HDP components. Once deployed, the cluster is ready to run Pig Scripts, MapReduce jobs, Hive Queries, or additional Hadoop services supported by HDP. You’ll also have access to the Ambari GUI to perform additional configuration and setup activities.

For additional information, please visit our bdutil Hortonworks documentation. You can download the bdutil setup scripts in zip format or tar.gz format.



To find out more about our joint collaboration, go here.



-Posted by Ram Ramanathan, Product Manager

Last week, Miles Ward, Google Cloud Platform’s Global Head of Solutions, kicked off our Container series with a post about the overarching concepts around containers, Docker, and Kubernetes. If you have not yet had a chance to read his post, we suggest you start there to arm yourself with the knowledge you will need for this post ...
Last week, Miles Ward, Google Cloud Platform’s Global Head of Solutions, kicked off our Container series with a post about the overarching concepts around containers, Docker, and Kubernetes. If you have not yet had a chance to read his post, we suggest you start there to arm yourself with the knowledge you will need for this post!



This week, Joe Beda, Senior Staff Engineer and one of the founding members of the Kubernetes project, will go a level deeper and talk in depth about the core technical concepts that underpin Google’s use of containers. These have informed the creation of Kubernetes and provide a foundation of future posts in this series.




What makes a container cluster?


The recent rise of container systems like Docker has (rightly) created a lot of excitement. The ability to package, transfer and run application code across many different environments enables new levels of fluidity in how we manage applications. But, as users expand their use of containers into production, new problems crop up in terms of managing which containers run where, dealing with large numbers of containers and facilitating communication between containers across hosts. This is where Kubernetes comes in. Kubernetes is an open source toolkit from Google that helps to solve these problems.



As we discussed in last week’s post, we consider Kubernetes a "container cluster manager." Lots of folks call projects in this area "orchestration systems," but that has never rung true for me. Orchestral music is meticulously planned with the score decided and distributed to the musicians before the performance starts. Managing a Kubernetes cluster is more like an improv jazz performance. It is a dynamic system that reacts to conditions and inputs in real time.



So, what makes a container cluster? Is it a dynamic system that places and oversees sets of containers and the connections between them? Sure, that and a bunch of compute nodes (either raw physical servers or virtual machines). In the remainder of this post, we’ll explore 3 things: what makes up a container cluster, how to work with them and how the interconnected elements work together. Additionally, based on our experience, a container cluster should include a management layer, and we will dig into the implications of this below.




Why run a container cluster?


Here at Google, we build container clusters around a common set of requirements: always be available, be able to be patched and updated, scale to meet demand, be easily instrumented and monitorable, and so on. While containers allow for applications to be easily and rapidly deployed and broken down into smaller pieces for more granular management, you still need a solution for managing your containers so that they meet these goals.



Over the past ten years at Google, we've found that having a container cluster manager addresses these requirements and provides a number of benefits:




  • Microservices in order to keep moving parts manageable. Having a cluster manager enables us to break down an application into smaller parts that are separately manageable and scalable. This lets us scale up our organization by having clear interfaces between smaller teams of engineers.

  • Self healing systems in the face of failures. The cluster manager automatically restarts work from failed machines on healthy machines.

  • Low friction horizontal scaling. A container cluster provides tools for horizontal scaling, such that adding more capacity can be as easy as changing a setting (replication count).

  • High utilization and efficiency rates. Google was able to dramatically increase resource utilization and efficiency after moving to containers.

  • Specialized roles for cluster and application operations teams. Developers are able to focus much more on the service they are building rather than on the underlying infrastructure that supports it. For example, the GMail operations and development teams rarely have to talk directly to the cluster operations team. Having a separation of concerns here allows (but doesn't force) operation teams to be more widely leveraged.




Now, we understand that some of what we do is unique, so lets explore the ingredients of a great container cluster manager and what you should focus on to realize the benefits of running containers in clusters.




Ingredient 1: Dynamic container placement


To build a successful cluster, you need a little bit of that jazz improv. You should be able to package up your workload in a container image and declaratively specify your intents around how and where it is going to run. The cluster management system should decide where to actually run your workload. We call this "cluster scheduling."



This doesn't mean that things are placed arbitrarily. On the contrary, there are a whole set of constraints that come in to play to make cluster scheduling a very interesting and hard problem1 from a computer science point of view. When scheduling, the scheduler makes sure to place your workload on a VM or physical machine with enough spare capacity (e.g. CPU, RAM, I/O, storage). But, in order to meet a reliability objective, the scheduler might also need to spread a set of jobs across machines or racks in order to reduce risk from correlated failures. Or perhaps some machines have special hardware (e.g. GPUs, local SSD, etc.). The scheduler should also react to changing conditions and reschedule work to deal with failures, growing/shrinking the cluster or increased efficiency. To enable this, we encourage users to avoid pinning a container to a specific machine. Sometimes you have to fall back on "I want that container on that machine" but that should be a rare exception.



The next question is: what are we scheduling? The easy answer here is individual containers. But often times, you want to have a set of containers running as a team on the same host. Examples include a data loader with a data server or a log compressor/saver process paired with a server. These containers usually need to be located together, and you want to ensure that they do not become separated during the dynamic placement. To enable this, we introduced in Kubernetes a concept known as a pod. A pod is a set of containers that are placed and scheduled together as a unit on a worker computer (also known as a Kubernetes node). By working to place a group of pods, Kubernetes can pack lots of work onto a node in a reliable way.






Ingredient 2: Thinking in sets


When working on a single physical node, tools generally don't operate on containers in bulk. But when moving to a container cluster you want to easily scale out across nodes. To do this, you need to work in terms of sets of things instead of singletons. And you want to keep those sets similarly configured. In Kubernetes, we manage sets of pods using two additional concepts: labels and replication controllers.



Every pod in Kubernetes has a set of key/value pairs associated with it that we call labels. You can select a set of pods by constructing a query based on these labels. Kubernetes has no opinion on the "correct" way to organize pods. It is up to you to organize your pods in a way that makes sense to you. You can organize by application tiers, geographic location, development environment etc. In fact, as labels are non-hierarchical, you organize your pods in multiple ways simultaneously.



Example: let's say you have a simple service that has a frontend and a backend. But you also have different environments – test, staging and production. You can label your production frontend pods with env=prod tier=fe and your production backend pods with env=prod tier=be. You could similarly label your test and staging environments. Then, when operating on or inspecting your cluster, you could just restrict yourself to the pods where env=prod to see both the frontend and backend. Or you can look at all of your frontends across test, staging and production. You can imagine how this organization system can adapt as you add more tiers and environments.






Figure 1 - Filtering pods using labels

Scaling

Now that we have a way of identifying and maintaining a pool of similarly configured machines, we can use this functionality for horizontally scaling (i.e. “scaling out”). To make this easy, we have a helper object in Kubernetes called the replication controller. It maintains a pool of these pods based on a desired replication count, a pod template and a label selector/query. It is really pretty easy to wrap your head around. Here is some pseudo-code:



object replication_controller {
property num_replicas
property template
property label_selector

runReplicationController(num_desired_pods, template, label_selector) {
loop forever {
num_pods = length(query(label_selector))
if num_pods > num_desired_pods {
kill_pods(num_pods - num_desired_pods)
} else if num_pods < num_desired_pods {
create_pods(template, num_desired_pods - num_pods)
}
}
}
}





So, for example, if you wanted to run a php frontend tier with 3 pods, you would create a replication controller with an appropriate pod template (pointing at your php container image) and a num_replicas count of 3. You would identify the set of pods that this replication controller is managing with a label query of env=prod tier=fe. The replication controller takes an easy to understand desired state and tirelessly works to make it true. And if you want to scale in or out all you have to do is change the desired replication count, and the replication controller will take care of the rest. By focusing on the desired state of the system, we end up with something that is easier to reason about.




Figure 2 - The Replication Controller enforces desired state


Ingredient 3: Connecting within a cluster


You can do a lot of interesting things with the ingredients listed so far. Any sort of highly parallelizable work distribution (continuous integration systems, video encoding, etc.) can work without a lot of communication between individual pods. However, most sophisticated applications are more of a network of smaller services (microservices) that communicate with each other. The tiers of traditional application architectures are really nodes in a graph.



A cluster management system needs a naming resolution system that works with the ingredients described above. Just like DNS provides the resolution of domain names to IP addresses, this naming service resolves service names to targets, with some additional requirements. Specifically, changes should be propagated almost immediately when things start or are moved and a "service name" should resolve to a set of targets, possibly with extra metadata about those targets (e.g. shard assignment). For the Kubernetes API, this is done with a combination of label selectors and the watch API pattern.1 This provides a very lightweight form of service discovery.



Most clients aren't going to be rewritten immediately (or ever) to take advantage of a new naming API. Most programs want a single address and port to talk to in order to communicate with another tier. To bridge this gap, Kubernetes introduces the idea of a service proxy. This is a simple network load balancer/proxy that does the name query for you and exposes it as a single stable IP/port (with DNS) on the network. Currently, this proxy does simple round robin balancing across all backends identified by a label selector. Over time, Kubernetes plans to allow for custom proxies/ambassadors that can make smarter domain specific decisions (keep an eye on the Kubernetes roadmap for details as the community defines this). One example that I'd love to see is a MySQL aware ambassador that knows how to send write traffic to the master and read traffic to read slaves.




Voila!


Now you can see how the three key components of a cluster management system fit together: dynamic container placement, thinking in sets of containers, and connecting within a cluster.



We asked the question at the top of this post, "What makes a container cluster?" Hopefully from the details and information we’ve provided, you have an answer. Simply put, a container cluster is a dynamic system that places and manages containers, grouped together in pods, running on nodes, along with all the interconnections and communication channels.



When we started Kubernetes with the goal of externalizing Google's experiences with containers, we initially focused on just scheduling and dynamic placement. However, when thinking through the various systems that are absolutely necessary to build a real application, we immediately saw that it was necessary to add the other additional ingredients of pods, labels and the replication controller. To my mind, these are the bare minimum necessary to build a usable container cluster manager. 



Kubernetes is still baking in the oven, but is coming together nicely. We just released v0.8, which you can download here. We’re still adding features and refining those that we have. We’ve published our roadmap to v1.0. The project has quickly established a large and growing community of contributing partners (such as Red Hat, VMware, Microsoft, IBM, CoreOS, and others) and customers, who use Kubernetes in a variety of environments.



While we have a lot of experience in this space, Google doesn't have all the answers. There are requirements and considerations that we don't see internally. With that in mind, please check out what we are building and get involved! Try it out, file bug reports, ask for help or send a pull request (PR).



-Posted by Joe Beda, Senior Staff Engineer and Kubernetes Cofounder



1 This is the classic knapsack problem which is NP-hard in the general case. 2 The "Watch API pattern" is a way to deliver async events from a service. It is common on lock server systems (zookeeper, etc.) that are derived from the original Google Chubby paper. The client essentially reaches out and "hangs" a request until there are changes. This is usually coupled with version numbers so that the client stay current on any changes.

Today’s post is by Sunil Sayyaparaju, Director of Product and Technology at Aerospike, the open source, flash-optimized, in-memory NoSQL database.



Aerospike, now available as a Click to Deploy of Aerospike ...
Today’s post is by Sunil Sayyaparaju, Director of Product and Technology at Aerospike, the open source, flash-optimized, in-memory NoSQL database.



Aerospike, now available as a Click to Deploy of Aerospike on Google Compute Engine, is an open source NoSQL database built to push the limits of modern processors and storage technologies, including SSDs, and developers are increasingly choosing NoSQL databases to power cloud applications. In a few minutes, you get an Aerospike cluster deployed to your specifications. Each node is configured with Aerospike Server Community Edition and Aerospike Management Console. The available tuning parameters can be found in the Click to Deploy Aerospike documentation.



In addition to the rapid deployment provided by Click to Deploy, we are also excited by the results we are seeing in our performance testing on Google Cloud Platform. Back in 2009, the founders of Aerospike saw that SSDs would be the future of storage, offering data persistence with better read/write access-times than rotational hard disks, greater capacity than RAM and a price/performance ratio that would fuel the development of applications that were previously not economically viable to run. The current proliferation of SSDs, now available on Google Compute Engine, validates this vision and this unprecedented level of price/performance will enable a new category of real-time data intensive applications.



In this post, we will showcase the performance characteristics of Local SSDs on Google Compute Engine and demonstrate RAM-like performance with 15x storage cost advantage using Local SSDs. We repeated recent tests published in “Aerospike Hits 1 Million Writes Per Second With Just 50 Nodes,” using Local SSDs instead of RAM.



Aerospike certifies Local SSDs on Google Compute Engine

When the first Aerospike customers deployed the Aerospike database in 2010, there was no way to benchmark SSDs. The standard fio (Flexible IO) tool for benchmarking disks did not fit our needs, so Aerospike developed and open sourced the Aerospike Certification Tool (ACT) for SSDs. This tool simulates typical database workloads:




  • Reads small objects (default 1500 bytes) using multiple threads (default 16).

  • Writes large blocks (default 128KB) to simulate a buffered write mechanism in DBMS.

  • Reads large blocks (default 128KB) to simulate typical background processing.




ACT is used to test SSDs from different manufacturers, understand their characteristics and select configuration values that maximize the performance of each model. The test is run for 24-48 hours because the characteristics of an SSD change over time, especially in the initial few hours. In addition, different SSDs handle garbage collection differently, resulting in a wide variability in performance. To help customers select drives that pass our performance criteria, based on results of ACT, Aerospike certifies and publishes this list of recommended SSDs.



Aerospike Certification Tools (ACT) for SSDs Setup

The following server and storage configurations were used to run the ACT test:




  • Machine: n1-standard-4 with 1 Local SSD provisioned (4 vCPU, 15 GB memory)

  • SSD size: 375GB

  • Read/Write size: 1500 bytes (all reads hit disk, but writes are buffered)

  • Large block read size: 128KB

  • Load: 6000 reads/s, 3000 writes/s, 71 large block reads per sec




ACT results show that 95% of Local SSD reads complete in under 1 ms

The results are shown in the graph below. The y axis shows the percentage of database read transactions that take longer than 1, 2, 4, or 8 milliseconds to complete. The x axis shows how performance changes during the first few hours and how consistent performance is as the benchmark continues to run for 24 hours.

The graph shows that after the first few hours, 95% of reads complete in under 1 ms.












  • only 5% take > 1 ms

  • only 3% take > 2 ms

  • only 1% take > 4 ms

  • a negligible number take > 8 ms








(Note: % of reads >1ms is a superset of % of reads >2ms which is a superset of % of reads >4ms and so on.)









Similar to other SSDs that Aerospike has tested, the performance of Local SSDs in Google Compute Engine starts out very high and, as with normal SSD characteristics, decreases slightly over time. Performance stabilizes quickly, in about 10 hours, which based on our experience benchmarking numerous SSDs, is very good.



Comparing Aerospike performance on Local SSDs vs. RAM

An earlier post showed how Aerospike hit 1 million writes per second with just 50 nodes on Google Compute Engine and 1 million reads per second with just 10 nodes running in RAM. Aerospike’s disk storage layer was designed to take advantage of SSDs, keeping in mind their unique characteristics. For this blog post, we repeated the performance test with 10 nodes, using Local SSDs instead of RAM, which yielded the following results:




  • 15x price advantage in storage costs with Local SSDs vs RAM

  • Achieved roughly the same write throughput using Local SSDs compared to RAM

  • Achieved half the read throughput using Local SSDs compared to RAM




Aerospike delivers 15x storage cost advantage with Local SSDs vs. RAM

The table below shows the hardware specifications of the machines used in our testing. Using Local SSDs instead of RAM, we got 25x more capacity (750GB/30GB) at 1.64x the cost ($417.50/$254), for a 15x price advantage ($8.46/$0.56). We used 20 clients of type n1-highcpu-8.











Aerospike demonstrates RAM-like Latencies for Local SSDs vs. RAM

The graph below shows the percentage of reads >1ms and writes >8ms, for a number of read-write workloads.



Write latencies for Local SSDs are similar to RAM because in both cases, writes are first written in memory and then flushed to disk. Although read latencies are higher with Local SSDs, the differences are not noticeable here because most reads using Local SSDs finish under 1ms and the percentage of reads taking more than 1ms is similar for both RAM and Local SSDs.



Aerospike demonstrates RAM-like Throughput for Writes on Local SSDs vs. RAM

The graph below compares throughput for different Read-Write workloads. The results show:


  • 1.0x write throughput (while doing 100% writes) using Local SSDs compared to RAM. Aerospike is able to achieve the same write throughput because of buffered writes, where writes are first written in memory and subsequently flushed to disk.

  • 0.5x read throughput (while doing 100% reads) using Local SSDs compared to RAM. Aerospike is able to achieve such high performance using Local SSDs because it stores indexes in RAM and they point to data on disk. The disk is accessed exactly once per read operation, resulting in highly predictable performance.








Surprisingly, when doing 100% reads with Local SSDs, over 55% complete in under 1 ms. Most reads to SSDs may take 0.5-1ms while reads in RAM may take < 0.5ms. That may be why there is drop in read throughput without a corresponding drop in the latencies  > 1ms.



Summary

This post documented results of the Aerospike Certification Test (ACT) for SSDs and demonstrated a 15x storage cost advantage and RAM-like performance with Local SSDs vs. RAM. This game changing price/performance ratio will power a new category of applications that analyse behavior, anticipate the future, engage users and monetize real-time big data driven opportunities across the Internet.



You can Deploy an Aerospike cluster today by taking advantage of the Google Cloud Platform free trial with support for Standard Persistent Disk and SSD Persistent Disk.



-Posted by Sunil Sayyaparaju, Director of Product and Technology at Aerospike



Aerospike is the registered trademark of Aerospike, Inc.. All other trademarks cited here are the property of their respective owners.