Today's post is the first in our new weekly roundup series — your Friday recap of product news, cool stories and videos and other highlights from the week. This week's announcements set GCP apart in the cloud blogosphere.




Today's post is the first in our new weekly roundup series — your Friday recap of product news, cool stories and videos and other highlights from the week. This week's announcements set GCP apart in the cloud blogosphere.



Take Google App Engine. First released in 2007, it’s taken a while for the world to fully grok its value  even internally. In “Why Google App Engine rocks: A Google engineer’s take” Google Cloud director of technical support Luke Stone gives a full recounting of his team’s experience with App Engine and other managed services like Google BigQuery. He describes how his team was blown away by their productivity gains, and urges all developers to try out the platform-as-a-service route.



Digital gaming store Humble Bundle corroborates this sentiment. In the weekly GCP Podcast, Humble Bundle engineering manager Andy Oxfeld describes how the video game retailer relies on App Engine to scale its website up and down to meet fluctuating demand for its limited-time games. He also describes how the team uses Task Queues, dedicated memcache for faster load times, Google Cloud Storage and BigQuery, to name a few. Check it out.



Storage was another hot topic this week. One of the week’s most talked about posts comes from database luminary Mosha Pasumansky and technical lead for Dremel/BigQuery, in which he discusses Capacitor, BigQuery’s columnar storage format. Long story short, Capacitor advances the state of the art of columnar data encoding, and when combined with Google Cloud Platform’s Colossus distributed file system, provides super-fast and secure queries with little to no effort on the part of BigQuery users. Woohoo!



But sometimes you need to do something a little less flashy  like resize a persistent disk. If you’ve been wondering how to do that on Google Compute Engine, wonder no more  GCP developer advocate Mete Atamel has put together a one-minute video tutorial on YouTube that walks you through the basic steps. Best of all, you don’t even need to reboot the associated VM!



Finally, Google was at OpenStack Summit in Austin this week, where Google partner CoreOS demonstrated ‘Stackanetes’  running OpenStack as a managed Kubernetes service. You can also hear Google product manager Craig McLuckie discuss the benefits to this approach on The Cube with Wikibon analysts Stu Miniman and Brian Gracely. McLuckie also shares his thoughts on working with the open source community, and Google’s evolution from an Internet to a Cloud company.







Today, Google reiterated its commitment to the security needs of its enterprise customers with the addition of two new certificates to ...






Today, Google reiterated its commitment to the security needs of its enterprise customers with the addition of two new certificates to Google Cloud Platform: ISO27017 for cloud security and ISO27018 for privacy. We also renewed our ISO27001 certificate for the fourth year in a row.



Google Cloud Platform services covered by these ISO certifications now include Cloud Dataflow, Cloud Bigtable, Container Engine, Cloud Dataproc and Container Registry. These join Compute Engine, App Engine, Cloud SQL, Cloud Storage, Cloud Datastore, BigQuery and Genomics on the list of services that will be regularly audited for these certificates.



Certifications such as these provide independent third-party validations of our ongoing commitment to world-class security and privacy, while also helping our customers with their own compliance efforts. Google has spent years building one of the world’s most advanced infrastructures, and as we make it available to enterprises worldwide, we want to offer more transparency on how we protect their data in the cloud.



More information on Google Cloud Platform Compliance is available here.







Zookeeper is the cornerstone of many distributed systems, maintaining configuration information, naming and providing distributed synchronization and group services. But using Zookeeper in a dynamic cloud environment can be challenging, because of the way it keeps track of members in the cluster. Luckily, there’s an open source package called Exhibitor that makes it simple to use Zookeeper with ...






Zookeeper is the cornerstone of many distributed systems, maintaining configuration information, naming and providing distributed synchronization and group services. But using Zookeeper in a dynamic cloud environment can be challenging, because of the way it keeps track of members in the cluster. Luckily, there’s an open source package called Exhibitor that makes it simple to use Zookeeper with Google Container Engine.



With Zookeeper, it’s easy to implement the primitives for coordinating hosts, such as locking and leader election. To provide these features in a highly available manner, Zookeeper uses a clustering mechanism that requires a majority of the cluster members to acknowledge a change before the cluster’s state is committed. In a cloud environment, however, Zookeeper machines come and go from the cluster (or “ensemble” in Zookeeper terms), changing names and IP addresses, and losing state about the ensemble. These sorts of ephemeral cloud instances are not well suited to Zookeeper, which requires all hosts to know the addresses or hostnames of the other hosts in the ensemble.







Netflix tackled this issue, and in order to more easily configure, operate, and debug Zookeeper in cloud environments, created Exhibitor, a supervisor process that coordinates the configuration and execution of Zookeeper processes across many hosts. Exhibitor also provides the following features for Zookeeper operators and users:


  • Backup and restore

  • A lightweight GUI for Zookeeper nodes

  • A REST API for getting the state of the Zookeeper ensemble

  • Rolling updates of configuration changes


Let’s take a look at using Exhibitor in Container Engine in order to run Zookeeper as a shared service for our other applications running in a Kubernetes cluster.



To start, provision a shared file server to host the shared configuration file between your cluster hosts. The easiest way to get a file server up and running in Google Cloud Platform is to create a Single Node File Server using Google Cloud Launcher. For horizontally scalable and highly available shared filesystem options on Cloud Platform, have a look at Red Hat Gluster Storage and Avere. The resulting file server will expose both NFS and SMB file shares.


  1. Provision your file server

  2. Select the Cloud Platform project in which you’d like to launch it

  3. Choose the following parameters for the dimensions of your file server


    • Zone — must match the zone of your Container Engine cluster (to be provisioned later)

    • Machine type — for this tutorial we chose a n1-standard-1 as throughput will not be very high for our hosted files

    • Storage disk size — for this tutorial, we chose a 100GB disk as we will not need high throughput nor IOPS


  4. Click Deploy


Next, create a Container Engine cluster in which to deploy your Zookeeper ensemble.


  1. Create a new Kubernetes cluster using Google Container Engine

  2. Ensure that the zone setting is the same as you use to deploy your file server (for the purposes of this tutorial, leave the defaults for the other settings)

  3. Click Create


Once your cluster is created, open up the Cloud Shell in the Cloud Console by


clicking the button in the top right of the interface.  It looks like:




Now set up your Zookeeper ensemble in Kubernetes.


  1. Set the default zone for the gcloud CLI to the zone you created for your file server and Kubernetes cluster:

  2. gcloud config set compute/zone us-east1-d

  3. Download your Kubernetes credentials via the Cloud SDK:

  4. gcloud container clusters get-credentials cluster-1

  5. Create a file named exhibitor.yaml with the following content. This will define our Exhibitor deployment as well as the service that other applications can use to communicate with it:

  6. apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
    name: exhibitor
    spec:
    replicas: 3
    template:
    metadata:
    labels:
    name: exhibitor
    spec:
    containers:
    - image: mbabineau/zookeeper-exhibitor
    imagePullPolicy: Always
    name: exhibitor
    volumeMounts:
    - name: nfs
    mountPath: "/opt/zookeeper/local_configs"
    livenessProbe:
    tcpSocket:
    port: 8181
    initialDelaySeconds: 60
    timeoutSeconds: 1
    readinessProbe:
    httpGet:
    path: /exhibitor/v1/cluster/4ltr/ruok
    port: 8181
    initialDelaySeconds: 120
    timeoutSeconds: 1
    env:
    - name: HOSTNAME
    valueFrom:
    fieldRef:
    fieldPath: status.podIP
    volumes:
    - name: nfs
    nfs:
    server: singlefs-1-vm
    path: /data
    ---
    apiVersion: v1
    kind: Service
    metadata:
    name: exhibitor
    labels:
    name: exhibitor
    spec:
    ports:
    - port: 2181
    protocol: TCP
    name: zk
    - port: 8181
    protocol: TCP
    name: api
    selector:
    name: exhibitor

    In this manifest we’re configuring the NFS volume to be attached on each of the pods and mounted in the folder where Exhibitor expects to find its shared configuration.
  7. Create the artifacts in that manifest with the kubectl CLI

  8. kubectl apply -f exhibitor.yaml

  9. Monitor the pods until they all enter the RUNNING state

  10. kubectl get pods -w

  11. Run the kubectl proxy command so that you can access the Exhibitor REST API for the Zookeeper state:

  12. kubectl proxy &

  13. Query the Exhibitor API using curl, then use jq to format the JSON response to be more human-readable

  14. export PROXY_URL=http://localhost:8001/api/v1/proxy/
    export EXHIBITOR_URL=${PROXY_URL}namespaces/default/services/exhibitor:8181
    export STATUS_URL=${EXHIBITOR_URL}/exhibitor/v1/cluster/status
    export STATUS=`curl -s $STATUS_URL`
    echo $STATUS | jq '.[] | {hostname: .hostname, leader: .isLeader}'

  15. After a few minutes, the cluster state will have settled and elected a stable leader.


You have now completed the tutorial and can use your Exhibitor/Zookeeper cluster just like any other Kubernetes service, by accessing its exposed ports (2181 for Zookeeper and 8181 for Exhibitor) and addressing it via its DNS name, or by using environment variables.



In order to access the Exhibitor web UI, run the kubectl proxy command on a machine with a browser, then visit this page.



To scale up your cluster size, simply edit the exhibitor.yaml file and change the replicas to an odd number greater than 3 (e.g., 5), then run “kubectl apply -f exhibitor.yaml” again. Cause havoc in the cluster by killing pods and nodes in order to see how it responds to failures.







In December 2011, I had been working for Google for nine years and was leading a team of 10 software developers, supporting the AdSense business. Our portfolio consisted of over 30 software systems, mostly web apps for business intelligence that had been built over the past decade, each on a stack that seemed like a good idea at the time. Some were state-of-the-art custom servers built on the (then) latest Google web server libraries and running directly on Borg. Some were a LAMP stack on a managed hosting service. Some were running as a cron job on someone’s workstation. Some were weird monsters, like a LAMP stack running on Borg with Apache customized to work with production load balancers and encryption. Things were breaking in new and wonderful ways every day. It was all we could do to keep the systems running ...




In December 2011, I had been working for Google for nine years and was leading a team of 10 software developers, supporting the AdSense business. Our portfolio consisted of over 30 software systems, mostly web apps for business intelligence that had been built over the past decade, each on a stack that seemed like a good idea at the time. Some were state-of-the-art custom servers built on the (then) latest Google web server libraries and running directly on Borg. Some were a LAMP stack on a managed hosting service. Some were running as a cron job on someone’s workstation. Some were weird monsters, like a LAMP stack running on Borg with Apache customized to work with production load balancers and encryption. Things were breaking in new and wonderful ways every day. It was all we could do to keep the systems running  just barely.



The team was stressed out. The Product Managers and engineers were frustrated. A typical conversation went like this:



          PM: “You thought it would be easy to add the foobar feature, but it’s been four

          days!”

          Eng: “I know, I know, but I had to upgrade the package manager version

          first, and then migrate off some deprecated APIs. I’m almost done with that stuff.

          I’m eager to start on the foobar,  too.”

          PM: “Well, now, that's disappointing.”



I surveyed the team to find the root cause of our inefficiency: we were spending 60% of our time on maintenance. I asked how much time would be appropriate, and the answer was a grudging 25%. We made a goal to reduce our maintenance to that point, which would free up the time equivalent of three and a half of our 10 developers.



Google App Engine had just come out of preview in September 2011. A friend recommended it heartily  he'd been using it for a personal site  and he raved that it was low-maintenance, auto-scaling and had built-in features like Google Cloud Datastore and user-management. Another friend, Alex Martelli, was using it for several personal projects. I myself had used it for a charity website since 2010. We decided to use it for all of our web serving. It was the team’s first step into PaaS.



Around the same time, we started using Dremel, Google’s internal version of BigQuery. It was incredibly fast compared to MapReduce, and it scaled almost as well. We decided to re-write all of our data processing to use it, even though there were still a few functional gaps between it and App Engine at the time, for example visualization and data pipelines. We whipped up solutions that are still in use by hundreds of projects at Google. Now Google Cloud Platform users can access similar functionality using Google Cloud Datalab.



What we saw next was an amazing transformation in the way that software developers worked. Yes, we had to re-write 30 systems, but they needed to be re-written anyway. WIth that finished, developing on the cloud was so much faster -- I recall being astonished at seeing the App Engine logs, that I had done 100 code, test, and deploy cycles in a single coding session. Once things were working, they kept working for a long time. We stopped debating what stack to choose for the next project. We just grabbed the most obvious one from Google Cloud Platform and started building. If we found a bug in the cloud infrastructure, it was promptly fixed by an expert. What a change from spending hours troubleshooting library compatibility!



Best of all, we quickly got the time we spent on maintenance down to 25%, and it kept going down. At the end of two years I repeated the survey; the team reported that they now only spent 5% of their time on maintenance.



We started having good and different problems. The business wasn’t generating ideas fast enough to keep us busy, and we had no backlog. We started to take two weeks at the end of every quarter for a “hackathon” to see what we could dream up. We transferred half of the developers to another, busier team outside of Cloud. We tackled larger projects and started out-pacing much larger development teams.



After seeing how using PaaS changed things for my team, I want everyone to experience it. Thankfully, these technologies are available not only to Google engineers, but to developers the world over. This is the most transformational technology I’ve seen since I first visited Google Search in 1999  it lets developers stop doing dumb things and get on with developing the applications that add value to our lives.








It's been a little over a year (and millions of migrations) since the last time we talked about our live migration technology and how we use it to keep your virtual machines humming along while we patch, repair and update the software and hardware infrastructure that powers ...




It's been a little over a year (and millions of migrations) since the last time we talked about our live migration technology and how we use it to keep your virtual machines humming along while we patch, repair and update the software and hardware infrastructure that powers Google Compute Engine. It’s also an important differentiator for our platform compared to other cloud providers.



Our customer base has grown exponentially in the past year, and brought with it a lot of new and interesting workloads to test the mettle of our live migration technology. The vast majority of customers and workloads have been able to go about their business without noticing our maintenance events, with a few exceptions.




Down and to the right


A picture is worth 1,000 words and the following graph shows the improvements we've made to live migration blackout times (the amount of time your VM is paused) over the last year (note the log scale):









We've done millions of live migrations in the last year and as you can see from the graph, we've made significant improvements to median blackout duration and variance. The 99th percentile blackout graph is too noisy to display nicely, but we've improved that by a factor of six in the last year as well.




Lessons learned


The graph also shows that we didn't always get it right, and we've learned a lot from working closely with the handful of customers whose applications just don't play well with live migration.



The most important thing we learned is that the current 60 second pre-migration signal is overkill for the vast majority of customers. At the same time, the 60 second signal is too short for the handful of customers that needed to perform some sort of automated drain or failover action.



We also learned that the older in-place upgrade maintenance mechanism (not captured in the above graph) we use to update our hypervisor is problematic for customers whose applications are sensitive to live migration blackouts.



Finally, we learned that surfacing VM migrations as system events in our ZoneOperations list led to a lot of confusion for little incremental value, since the events were also logged, in greater detail, in Compute Engine’s Cloud Logs. In many cases, customers noticed an issue with their service, saw the migration system event in the ZoneOperations list, and spent a long time investigating that as the cause, only to find that it was a red herring.




What's next?


Aside from continuing to measure and improve the impact of our maintenance events, the first thing we're going to do is drop the 60 second notice for VMs that don't care about it. If a VM is actively monitoring the metadata maintenance event URI, we'll continue to give 60 seconds notice before the migration. If a VM is not monitoring the URI, however, we'll start the migration immediately. This will not change the behavior of VMs that are configured to terminate instead of migrate. We expect to roll this out by mid-May.



In the coming quarters we’ll also begin providing a longer, more usable advance notice for VMs that are configured to terminate instead of migrate. This advance notice signal will be available via the metadata server as well as via the Compute Engine API on the Instance resource.



We'll also add a new API method to the Instance resource to allow customers to trigger maintenance events on a VM. This will give customers a means to determine if a VM is impacted by maintenance events and if so, to validate that its drain/failover automation works correctly.



The second major change we'll make is to use live migration for all instance virtualization software stack updates, replacing in-place upgrades. This will make these maintenance events visible, actionable and less disruptive while allowing us to focus our improvements on one maintenance mechanism rather than two. Finally, we'll remove VM migration events from our Operations collections and expand on the details of the events we log in Cloud Logs.



We strongly believe that fast, frequent, reliable, and above all, transparent infrastructure maintenance is essential to keeping our systems secure and to delivering the new features and services that customers want. We're pleased with the results we've seen so far, and we're excited to continue making it better. Follow us on G+ and/or Twitter to stay informed as we start rolling out these improvements. If you have any questions or concerns, please reach out we'd love to hear from you.





Couldn’t make it to GCP NEXT last month? Lucky for you, recordings of over thirty sessions are up on YouTube to watch and learn from.




Couldn’t make it to GCP NEXT last month? Lucky for you, recordings of over thirty sessions are up on YouTube to watch and learn from.



But with so much quality content to watch, where do you start? Leaving aside the wildly popular Data Center 360 video that tours the Dalles, Oregon facility (one million views and counting), and must-see Day One keynotes and Day 2 keynotes, here are the five most popular GCP NEXT breakout sessions on YouTube.




  1. What’s the secret behind Google’s legendary uptime? Three words: site reliability engineers. In “How Google Does Planet-Scale Engineering for Planet-Scale Infrastructure” Google engineering director Melissa Bindle lays out the relationship between SREs and developer teams, and how they ensure uptime for Google.com and Google Cloud Platform users. Even if your organization isn’t operating at quite planetary scale, lessons abound for anyone that cares about building and operating resilient environments.

  2. What exactly is machine learning, and what can it do for you? In “Build smart applications with your new superpower: cloud machine learning” Google Developer Advocate Julia Ferraioli walks through the range of Google’s machine learning options, from TensorFlow, to Google Cloud Machine Learning to the Machine Learning APIs. She’s also joined by David Zuckerman, head of developer experience at Wix.com, who describes how it used Google machine learning to mine customer images with the new Vision API.

  3. Cloud Platform users do the darndest things. In Analyzing 25 billion stock market events in an hour with NoOps on GCP, Fidelity Information Systems CTO Neil Palmer and developer Todd Ricker describe how they “ingest, process and analyze the entire U.S. options and equities market in 4 hours using Google Cloud Dataflow, Google Coud Pub/Sub and Google Cloud Bigtable.” That’s 15 TB of data. On a daily basis. That it then makes available for market reanalysis to the regulators.

  4. Cloud Platform isn’t all about crunching large amounts of data for ginormous financial services firms. It’s also about democratizing access to Google’s amazing capabilities to the world at large, wherever they may be. In “Building iOS Apps with Firebase,” Google Cloud Developer Advocate Sara Robinson walks users through how to develop a native iOS app using Firebase, a mobile back-end as a service, and then ties that app into Google Cloud Vision API to provide face and object recognition.

  5. Rounding out the list of most-watched breakout sessions is Painless container management with GKE & Kubernetes. Brendan Burns, Google’s lead engineer on the Kubernetes project, provides context and backstory; Rich Steenberg, principal engineer at WePay, a Kubernetes shop, describes how they’re using Kubernetes today; and Tim Hockin, a Google software engineer, describes what’s new in Kubernetes 1.2, and what’s on tap.




Did your favorite session not make the list? Share it in the comments so the rest of the world can benefit from what you learned.





Iron Mountain has expanded the bandwidth of its network link to Google Cloud Platform by 10x. This enables faster transmission of your data to GCP, which is super helpful for meeting critical project deadlines. As an illustration of transmission speed, moving 50TB of data over the expanded link takes less than a day, compared to more than five days with the old connection.




Iron Mountain has expanded the bandwidth of its network link to Google Cloud Platform by 10x. This enables faster transmission of your data to GCP, which is super helpful for meeting critical project deadlines. As an illustration of transmission speed, moving 50TB of data over the expanded link takes less than a day, compared to more than five days with the old connection.



Iron Mountain has been providing cloud seeding services for our customers in North America since 2015.



Enterprises are eager to run applications on public clouds to benefit from the security, scalability and reduced management burden. However, moving data to the public cloud as part of a migration or hybrid process is slow and troublesome. Uploading one terabyte of data using a typical small business DSL connection may take up to 100 days!



As a result, enterprises are frequently turning to cloud seeding services, also referred to as offline ingestion or offline data import, from providers like Iron Mountain.



Instead of trying to push data to a public cloud over a limited bandwidth, customers copy data to storage devices and ship them to a third party who can then securely upload the data to the customers’ data buckets in cloud storage. Cloud seeding providers include, as part of their service portfolio, chain-of-custody verification and security options like encryption to help ensure that customer data is handled securely.



Iron Mountain has also introduced support for LTO (Linear Tape-Open) tapes, which is a good transport medium for large amounts of data. Customers with data already on LTO media can immediately start moving that data into low cost Google Cloud Storage Nearline.



You could be eligible for up to $500 of credit on Cloud Platform if you ingest with Iron Mountain! More information can be found here. If you’re ready to start a project on Cloud Platform and are not sure how to start moving your data to the cloud, we’ve got you covered with offline data import.





Customers often ask us for guidance about how to build PCI DSS compliant environments on top of Google Cloud Platform. From our work in the field, we recently put together a handy-dandy tutorial to help them get started.




Customers often ask us for guidance about how to build PCI DSS compliant environments on top of Google Cloud Platform. From our work in the field, we recently put together a handy-dandy tutorial to help them get started.



This is no small thing. Many businesses today have online storefronts, and the vast majority of those take credit cards. When you accept credit cards for your business, you have to make sure you do that securely to ensure customer trust and security, to get paid and to meet the necessary regulations, namely PCI DSS.



The PCI DSS, created by the PCI Security Standards Council, is an information security standard created by the major credit card companies; as such, any business that takes Visa, MasterCard, Discover, American Express or JCB is expected to be PCI DSS compliant, and can be fined or penalized if it is not.



Creating and managing a compliant PCI DSS environment can be a non-trivial task. Thankfully, if you’re on Cloud Platform, managed services such as Stackdriver Monitoring, Stackdriver Logging, and Google BigQuery can help. Our solution, for example, includes these basic components:




  • A lightweight Google Compute Engine front-end application that accepts credit card information and sends it to an external payment processor. Importantly, that information is never recorded, it's only transmitted.

  • An external payment processor that charges the credit card if it's accepted or rejects it if it’s not, and notifies your application of the result. Since this is just a notification to your application, no credit card data is transmitted or recorded from the payment processor.

  • Stackdriver Logging, which logs the actions of every application and server via Squid Proxy which restricts the event traffic and sends them to Stackdriver Monitoring, which monitors the events

  • BigQuery, which can be used to analyze the logs, run ad-hoc audit queries and create reports.







(click to enlarge)

For further details, check out the full solution for this design. We hope you'll find it useful, and we welcome and encourage your feedback. Comment here or reach out to @petermark on Twitter.






The infrastructure underpinning Google Cloud Platform has always been good, and it keeps getting better. We recently increased the limit on the number of attached persistent disk storage volumes ...





The infrastructure underpinning Google Cloud Platform has always been good, and it keeps getting better. We recently increased the limit on the number of attached persistent disk storage volumes per Google Compute Engine instance from 16 to as many as 128 volumes on our largest standard instance sizes (giving us three times the capacity of the competition). We also recently increased the total quantity of attached persistent disk storage per instances from 10TB to as much as 64TB on our largest instance sizes, and introduced the ability to resize persistent disk volumes with no downtime.






(click to enlarge)

These changes were enabled by Google’s continued innovations in data center infrastructure, in this case networking. Combined with Colossus, Google’s high-performance global storage fabric, we have greatly increased the size, quantity, and flexibility of network attached storage per instance without sacrificing persistent disk’s legendary rock solid performance and reliability.



This ties back to one of Google Cloud Platform’s core strengths: hosting Linux containers. We are committed to making GCP the best place on the web to run Docker-based workloads, building on over a decade of experience running all of Google on Linux containers. To help you realize the same benefits that we did, we created and open-sourced Kubernetes, a tool for creating and managing clusters of Docker containers, and launched Google Container Engine to provide a managed, hosted, Kubernetes-based service for Docker applications.



Red Hat is a large Kubernetes user and has adopted it across its product portfolio, including making it a core component of OpenShift, its Platform as a Service offering. Working with Red Hat late last year in preparation for offering OpenShift Dedicated on Google Cloud Platform, it became clear that we needed to increase the number of attached persistent disk volumes per instance to help Red Hat efficiently bin pack Kubernetes Pods (each of which may have one or more attached volumes) across their Google Cloud Platform infrastructure.





Being able to attach more persistent disk volumes per instance isn’t just useful to Red Hat. In addition to better support for Kubernetes and Docker, this feature can also be used in various scenarios that require a large number of disks. For example, you keep data from different web servers on separate volumes to insure data isolation when an instance is running multiple web servers.



To take advantage of this feature for your own Kubernetes clusters running on Google Compute Engine, simply set the "KUBE_MAX_PD_VOLS" environment variable to whatever you want the limit to be, based on the maximum number of volumes the nodes in your cluster support.





We're at NAB this week and are thrilled to announce new Google Cloud Platform partners and product updates that underscore the growing role of cloud computing ...




We're at NAB this week and are thrilled to announce new Google Cloud Platform partners and product updates that underscore the growing role of cloud computing in the media and entertainment industry.



We’re collaborating with Autodesk to launch a new cloud-optimized rendering service called Autodesk Maya for Cloud Platform ZYNC Render. Autodesk software has been behind the past 21 Academy Award winners for Best Visual Effects and we’re bringing this capability to Google Cloud Platform.



The combined offering captures the aim of both companies to enable individual artists and developers to focus on content, abstracting away the nuances of managing infrastructure. Small teams of artists can tap world-class infrastructure to realize the creative vision of what was once limited to much larger studios.



Here’s how it works:




  • Product teams at Google and Autodesk have developed a cloud-optimized version of Autodesk® Maya®, the popular tool for animation, modeling and rendering. Via a Maya plugin, 3D scenes are transferred to Google Compute Engine in the background, while the artist is working. Because Maya is capable of running on the artist’s workstation and also in the cloud on GCP, this allows artists to run massively parallel rendering far more efficiently than before  taking advantage of the scalability, performance, and price benefits of GCP

  • Compared with the non-optimized version, Maya® customers see up to 10x improvement in file upload efficiency. This allows many rendering jobs to start instantaneously, cutting wait time and accelerating time to finished shot


We’re also excited to announce ZYNC Render support for Pixar’s historic RenderMan, with licensing fully built-in to the product. ZYNC users are now able to spin up 500 machines per account, scaling to 32,000 rendering cores, with new support for 64-core machines  making short work of ambitious rendering jobs.



In addition to our collaboration with Autodesk, we've made some big strides in our offerings for the media and entertainment industry. Here are some recent updates:




Cloud Vision API graduates to General Availability


The goal of Cloud Vision API is to provide vision capability to your applications in the same way that Google Photos does. It's a powerful tool for media and entertainment companies enabling you to classify images and analyze emotional facial attributes. To further improve our customer experience, Cloud Vision API is going into general availability today with new features:




  • Logo Detection expanded to cover millions of logos

  • Updated OCR for understanding more granular text like receipts






Cloud CDN graduates to Beta


We’re launching Cloud CDN Beta, allowing your media content to be pushed out to Google’s network edge and cached close to users. As always, data travels via Google’s network and reaches users who expect instantaneous access to images and live-stream video experiences. Cloud CDN is also fully integrated with Google’s global load balancing and enterprise-grade security to distribute media workloads anywhere they originate, so jobs never get bogged down.




Lytro chooses Cloud Platform and The Foundry


One question we see today’s top innovators ask continuously is: with massive cloud infrastructure at our fingertips, how can we make a major leap forward in the way things are done today? Lytro is one example. Its technology seems to defy the traditional physics of photography, capturing massive volumes of visual data with such fidelity that infinite creative choices abound: unprecedented control over focus, perspective, aperture and shutter angle, all in post-production. Lytro selected GCP and The Foundry to help power their amazing invention. Learn more by watching their video.



For more on what's possible with cloud-enabled media, visit cloud.google.com/solutions/media and zyncrender.com, or contact us to discuss how cloud can enable your creative workflows.





Editor's note: Updated April 19, 2016



Google Consumer Surveys has launched an API, built on Google Cloud Platform, that lets your app users create and integrate surveys.




Editor's note: Updated April 19, 2016



Google Consumer Surveys has launched an API, built on Google Cloud Platform, that lets your app users create and integrate surveys.



We’ve spoken to research and non-research companies who are really interested in bringing the power of our Consumer Surveys tool into their own applications, and with this launch it’s finally possible. We imagine many different use-cases to tap into the millions of respondents our platform can connect you with, across a dozen or more markets around the world.



Leading up to our launch we worked closely with a handful of trusted testers who provided valuable feedback. These have included proprietary solutions to manage studies conducted on the Google Consumer Surveys platform (Kantar), predictive analytics solutions (Predictvia), and solutions that help customers visualize their Google Consumer Surveys data (MarketSight). We look forward to working with developers to build unique solutions that empower individuals and businesses to make better data driven decisions.



If you’re an existing Google Consumer Surveys Enterprise customer on an invoicing contract, you can start using the API immediately. If you’re not an Enterprise customer but are interested in accessing the API, you can get in touch with us at gcs-api@google.com. You can read more about our API on our API documentation site.







Our user interface (UI) is everything that you see and interact with. While the technologies that power Google Cloud Platform are complex, the UI for using the resources on GCP should be simple and intuitive. We’re paying close attention to this in our ...






Our user interface (UI) is everything that you see and interact with. While the technologies that power Google Cloud Platform are complex, the UI for using the resources on GCP should be simple and intuitive. We’re paying close attention to this in our Cloud Load Balancing service and are excited to introduce a new UI that aims to simplify Cloud Load Balancing configuration.



You’ll now be able to configure all Cloud Load Balancing flavors through a unified interface. This UI is also designed to seamlessly accommodate new load balancing features that are expected to land in 2016 and beyond, and deliver a simpler, more intuitive user experience.



Here’s an overview of the new UI, recapping Cloud Load Balancing config basics first. Cloud Load Balancing comes in multiple flavors — HTTP(S), TCP, SSL(TLS) and UDP, and distributes traffic among your configured backends. Your backend configuration consists of instance groups and instances that will service your user traffic. Your frontend configuration is comprised of the anycast IP address to which your users connect along with port, protocol and other related information.




(click to enlarge)

Of course, HTTP(S), TCP and UDP load balancers have flavor-specific configuration nuances, but we maintain a similar overall flow for configuring all of these flavors in the new UI.



You’ll begin your configuration by selecting the flavor of traffic you want to load balance: HTTP(S), TCP or UDP. Note that we’ll add support for SSL(TLS) in the new UI once this feature enters beta. To make your selection easier, we present you with a picker page as shown below:




(click to enlarge)

Let’s say you want to configure HTTP load balancer. Start by clicking the configure button below HTTP(S) Load Balancing. You’ll now be presented with the page to configure the load balancer name, backend configuration, host/path rules, which are relevant if you want to perform request routing based on the client request URL, and finally the frontend configuration.




(click to enlarge)

Once you’re done with the above steps, you can review and finalize your configuration. You can view all your configured load balancers as shown below:




(click to enlarge)





If you’d like additional information on any of your configured load balancers, you can simply use the drop-down card as shown below to view these details, including configuration as well as monitoring information.




(click to enlarge)



You can edit your configured load balancers any time by clicking the edit button shown above.



We’ve created this UI quickstart video to help you get started. After watching this video, we recommend that you play with the new UI and configure and edit HTTP(S), TCP and UDP load balancers to familiarize yourself with the UI flow and configuration options. You can also send in your feedback using the “Send feedback” button as shown below.




(click to enlarge)





This is the first release of the new Google Cloud Load Balancing UI. We’ll continue to iterate, make improvements and most importantly incorporate your feedback into future UI releases. So take the new UI for a spin and let us know what works well and what you’d love to see next.



For those of you who attended GCP NEXT, we hope you enjoyed the opportunity to learn about Google’s global network and the software-defined and distributed systems technologies that power Google Cloud Load Balancing. If you missed it, here’s the Global Load Balancing talk and our TCP/UDP network load balancing talk at NSDI last month.



Happy load balancing and scale on!





OpenStack Mitaka has just launched and we’re super excited about it. In collaboration with Red Hat ...




OpenStack Mitaka has just launched and we’re super excited about it. In collaboration with Red Hat and Biarca, we’ve developed an OpenStack Cinder backup driver for Google Cloud Storage, available in the Mitaka release.



Google joined the OpenStack Foundation in July 2015, when we announced Kubernetes integration with OpenStack. Our work on Mitaka is the next step on our roadmap to making Google Cloud Platform a seamless public cloud complement for OpenStack environments.



Backup and recovery services represent one of the most costly and complex aspects of large scale infrastructure management. OpenStack provides an efficient mechanism for allocation and management of persistent block storage through Cinder. In an OpenStack deployment, Cinder volumes house virtual machine data at rest as well as, potentially, the operating system boot device. In production deployments, it’s critical that this persistent data is protected as part of a comprehensive business continuity and disaster recovery strategy. To satisfy this requirement, Cinder provides a backup service that includes a backup driver specification allowing storage vendors to add support for additional backup targets.



This is where we come in. The addition of highly durable and available cloud-scale object storage allows organizations to shift from bulk commodity storage for backup to a more operationally efficient and cost-effective architecture, all while avoiding additional capital expenditures and the complexity of managing storage device scale out. The traditional barrier to adoption for object storage is the engineering effort required to adapt existing software and systems, designed for either file or block storage access, to object store native REST interfaces. The Cinder backup driver model provides the potential to abstract this engineering complexity for OpenStack users. As long as an appropriate backup driver is installed, the backup target works with Cinder as intended.



Our Openstack Cinder backup driver is included as part of the standard Cinder backup driver set in Mitaka and requires minimal setup to get up and running. Full Cinder backup functionality was successfully tested with the Cloud Storage driver against 1GB, 5GB and 10GB Cinder volume sizes. In addition, the driver provides the following user configurable parameters to allow administrators to tune the installation:



















ParameterPurpose
backup_gcs_credential_file

Denotes the full path of the json file of the Google service account (downloaded from the Google Developer Console in step 3)
backup_gcs_bucket

GCS bucket name to use for backup. Please refer to the official bucket naming guidelines.
backup_gcs_driverUsed for selecting the Google backup driver
backup_gcs_project_id

Denotes the project ID where the backup bucket will be created
backup_gcs_object_size

The size in bytes of GCS backup objects.



default: 52428800 bytes
backup_gcs_block_size

The change tracking size for incremental backup, in bytes.



backup_gcs_object_size has to be a multiple of backup_gcs_block_size



default: 327678 bytes
backup_gcs_user_agenthttp user-agent string for the gcs API
backup_gcs_reader_chunk_sizeChunk size for GCS object downloads in bytes.



default: 2097152 bytes
backup_gcs_writer_chunk_size

Chunk size for GCS object uploads in bytes. Pass in a value of -1 to cause the file to be uploaded as a single chunk.



default: 2097152 bytes
backup_gcs_num_retries/td>Number of times to retry transfers.



default: 3
backup_gcs_bucket_locationLocation of GCS bucket.



default: ‘US’
backup_gcs_storage_classStorage class of GCS bucket.



default: ‘NEARLINE’
backup_gcs_retry_error_codesList of GCS error codes for which to initiate a retry.



default: [‘429’]
backup_gcs_enable_progress_timerEnable or Disable the timer to send the periodic progress notifications to Ceilometer when backing up the volume to the GCS backend storage. The default value is True to enable the timer.



default: True







The Cinder backup driver works with any class of Cloud Storage, including our Google Cloud Storage Nearline archival option. Nearline provides the full durability of Standard storage, at a slightly lower level of availability and with a slightly higher latency and offers read performance of 4MB/TB stored, scaling with storage density. As an example, 3TB of backup data can be restored at 12MB/s. The low cost yet high performance of Nearline makes backing up Cinder volumes economical while offering the ability to quickly restore if necessary.



If you’re running OpenStack, there’s no need to invest in additional storage systems or build out a second datacenter for backup and recovery. You can now use Cloud Storage in a hybrid scenario, optimized via the Cinder backup driver now available in Mitaka.





We're excited to introduce Stackdriver Error Reporting to help you quickly understand your application’s top or new errors. Stackdriver Error Reporting counts, analyzes and aggregates in real time the crashes in your running cloud services, and notifies you when there's something new.




We're excited to introduce Stackdriver Error Reporting to help you quickly understand your application’s top or new errors. Stackdriver Error Reporting counts, analyzes and aggregates in real time the crashes in your running cloud services, and notifies you when there's something new.




Stackdriver Error Reporting: listing errors sorted by occurrence count







Stackdriver Error Reporting allows you to monitor your application’s errors, aggregated into meaningful groups tailored to your programming language and framework. This helps you see the problems rather than the noise.



Maybe you want to watch out for recently occurred errors in a given service, or judge the user impact of an outage. Just sort by first/last seen date, occurrences, or number of affected users to get the information you need.




Email notification for an error that could not be grouped with previously received errors







You can opt in to be notified when a new error cannot be grouped with the previously received ones, and jump directly from the email to the details of the new error.



The “detail view” presents key error information to help you assess its severity and understand the root cause: a bar chart over time, the first time this error has been seen, and the affected service versions. Look through error samples to better diagnose the problem: inspect the stack trace focusing on its relevant parts and start to debug in Stackdriver Debugger, learn more about the request that triggered it and navigate to the associated logs.






Details of an error: understand the root cause







While an immediate next step could be to rollback your service, you also want to work on fixing the errors. Stackdriver Error Reporting integrates with your regular workflow by allowing you to link an error to an issue from your issue tracker. Once done, you can see at a glance which errors have associated issues.






Link an error to an issue URL in your favorite bug tracker







The feedback from our alpha testers has been extremely positive. A frequent response we heard was that Stackdriver Error Reporting helped you identify some hard-to-catch intermittent errors that were hidden in logs, increasing product quality. Thank you for the feedback!




Get started


Stackdriver Error Reporting is now available in beta for everyone to try. Zero setup is needed for App Engine applications and requires just a few configuration steps on other platforms.



Visit http://console.cloud.google.com/errors to get started with your project.

In this blog post we caught up with Chris Jones, a Site Reliability Engineer on Google App Engine for the past three-and-a-half years and SRE at Google for almost 9 years, to find out more about running production systems at Google. Chris is also one of the editors of Site Reliability Engineering: How Google Runs Production Systems, published by O’Reilly and available today. ...
In this blog post we caught up with Chris Jones, a Site Reliability Engineer on Google App Engine for the past three-and-a-half years and SRE at Google for almost 9 years, to find out more about running production systems at Google. Chris is also one of the editors of Site Reliability Engineering: How Google Runs Production Systems, published by O’Reilly and available today.



Google App Engine serves over 100 billion requests per day. You might have heard about how our Site Reliability Engineers, or SREs, make this happen. It’s a little bit of magic, but mostly about applying the principles of computer science and engineering to the design and development of computing systems —generally very large, distributed ones.



Site Reliability Engineering is a set of engineering approaches to technology that lets us or anyone run better production systems. It went on to inform the concept of DevOps for the wider IT community. It’s interesting because it’s a relatively straightforward way of improving performance and reliability at planet-scale, but can be just as useful for any company for say, rolling out Windows desktops. Done right, SRE techniques can increase the effectiveness of operating any computing service.



Q: Chris, tell us how many SREs operate App Engine and at what scale?



CJ: We have millions of apps on App Engine serving over 100 billion requests per day supported by dozens of SREs.



Q: How do we do that with so few people?



CJ: SRE is an engineering approach to operating large-scale distributed computing services. Making systems highly standardized is critical. This means all systems work in similar ways to each other, which means fewer people are needed to operate them since there are fewer complexities to understand and deal with.



Automation is also important: our turn-up processes to build new capacity or scale load balancing are automated so that we can scale these processes nicely with computers, rather than with more people. If you put a human on a process that’s boring and repetitive, you’ll notice errors creeping up. Computers’ response times to failures are also much faster than ours. In the time it takes us to notice the error the computer has already moved the traffic to another data center, keeping the service up and running. It’s better to have people do things people are good at and computers do things computers are good at.



Q: What are some of the other approaches behind the SRE model?



CJ: Because there are SRE teams working with many of Google’s services, we’re able to extend the principle of standardization across products: SRE-built tools originally used for deploying a new version of Gmail, for instance, might be generalized to cover more situations. This means that each team doesn’t need to build its own way to deploy updates. This ensures that every product gets the benefit of improvements to the tools, which leads to better tooling for the whole organization.



In addition, the combination of software engineering and systems engineering knowledge in SRE often leads to solutions that synthesize the best of both backgrounds. Google’s software network load balancer, Maglev, is an example — and it’s the underlying technology for the Google Cloud Load Balancer.



Q: How do these approaches impact App Engine and our customers running on App Engine?



CJ: Here’s a story that illustrates it pretty well. In the summer of 2013 we moved all of App Engine’s US region from one side of the country to the other. The move incurred no downtime to our customers.



Q: How?



CJ: We shut down one App Engine cluster, and as designed, the apps running on it automatically moved to the remaining clusters. We had created a copy of the US region’s High Replication Datastore in the destination data center ahead of time so that those applications’ data (and there were petabytes of it!) was already in place; changes to the Datastore were automatically replicated in near real-time so that it was consistently up to date. When it was time to turn on App Engine in the new location, apps assigned to that cluster automatically migrated from their backup clusters and had all their data already in place. We then repeated the process with the remaining clusters until we were done.



Advance preparation, combined with extensive testing and contingency plans, meant that we were ready when things went slightly wrong and were able to minimize the impact on customers. And of course, we put together an internal postmortem — another key part of how SRE works — to understand what went wrong and how to fix it for the future, without pointing fingers.



Q: Very cool. How can we find out more about SRE?



CJ: Sure. If you’re interested in learning more about how Site Reliability Engineering works at Google, including the lessons we learned along the way, check out this website, the new book and we’ll also be at SREcon this week (April 7-8) giving various talks on this topic.



- Posted by Jo Maitland, Managing Editor, Google Cloud Platform

Rethinking data center design is happening out in the open here at Google. Today we're announcing that we’re working with Rackspace to co-develop an open server architecture design specification based on IBM’s new POWER9 CPU.
Rethinking data center design is happening out in the open here at Google. Today we're announcing that we’re working with Rackspace to co-develop an open server architecture design specification based on IBM’s new POWER9 CPU.



We also recently joined the Open Compute Project (OCP) and hope to submit this work to the OCP community. In fact, the POWER9 data center server specification is designed to fit in the proposed 48V open rack that we're co-designing with Facebook.



We’ve been working on OpenPOWER since 2014, when we helped found the OpenPOWER Foundation and we’re now POWER-ready. This means the architecture is fully supported across our toolchain, allowing developers to target apps to POWER with a simple flag.



It won’t surprise anyone to hear that demand for compute at Google has been relentless and it isn’t slowing down any time soon. We’ve found 60 trillion web addresses so far, versus one trillion in 2008. To meet that demand, our goal is to ensure our fleet is capable of handling ISA heterogeneity, to achieve best-in-class performance and value.



We're committed to open innovation and to optimizing performance and cost in data centers, and look forward to passing these savings along to our internal users as well as our Google Cloud Platform customers.



- Posted by Maire Mahony, Hardware Engineering Manager at Google & Director, OpenPOWER Foundation

Today we’re announcing another important update to our NoSQL database, Google Cloud Datastore. We’ve redesigned the underlying architecture that supports the cross-platform API for accessing Datastore outside of  Google App Engine ...
Today we’re announcing another important update to our NoSQL database, Google Cloud Datastore. We’ve redesigned the underlying architecture that supports the cross-platform API for accessing Datastore outside of Google App Engine, such as from Google Container Engine and Google Compute Engine, dramatically improving performance and reliability of the database. This follows new and simpler pricing for Cloud Datastore, announced last week.



The new Cloud Datastore API version (v1beta3) is available now. You need to enable this API before you can use it, even if you previously enabled an earlier version of the API.



Enable Cloud Datastore API



We’re also publishing a Service Level Agreement (SLA) for the API, which will take effect upon its General Availability release.



Now that v1beta3 is available, we’re deprecating the old v1beta2 API with a six-month grace period before decommissioning it on September 30th, 2016.




New Beta API revision


In the new release, we re-architected the entire serving path with an eye on performance and reliability. Cloud Datastore API revision v1beta3 has lower latency in both the average and long tail cases. Whether it’s magical items transferring to a player’s inventory faster, or browsing financial reports on a website that’s snappy  everyone loves fast.







In addition to these significant performance improvements, the v1beta3 API gives us a new platform upon which we can continue to improve performance and functionality.



You can use v1beta3 using the idiomatic Google Cloud Client Libraries (in Node.js, Python, Java, Go, and Ruby), or alternatively via the low-level native client libraries for JSON and Protocol Buffers over gRPC. You can learn more about the various client libraries in our documentation.




Cloud Datastore Service Level Agreement


Today we’re publishing a SLA for the General Availability release. Accessing Google Cloud Datastore via the Beta API is not covered by an SLA, although the SLA we’re publishing can help you estimate the expected performance of the Beta API. The SLA will only take effect when we reach General Availability.



App Engine Client libraries for Cloud Datastore are still covered as part of the App Engine SLA.



If you're using the Google Cloud Client Libraries, upgrading is as simple as updating the client libraries from GitHub. We look forward to what you build next with our faster cross-platform API for Cloud Datastore.



To learn more about Cloud Datastore, check out our getting started guide.



- Posted by Dan McGrath, Product Manager, Google Cloud Platform

Our guest blogs are written by third-party developers, partners and experts with real-world expertise creating and running applications on Google Cloud Platform. They're a great way to dive into the thoughts and opinions of folks at the forefront of software engineering and business development in cloud computing. Today’s guest blog is by Sravish Sridhar, Founder and CEO of mobile Backend-as-a-Service provider, Kinvey ...
Our guest blogs are written by third-party developers, partners and experts with real-world expertise creating and running applications on Google Cloud Platform. They're a great way to dive into the thoughts and opinions of folks at the forefront of software engineering and business development in cloud computing. Today’s guest blog is by Sravish Sridhar, Founder and CEO of mobile Backend-as-a-Service provider, Kinvey.



Modernizing legacy enterprise apps to work on mobile devices is no small feat, and if you want to be sure those apps still meet tough government regulations once mobile, you’re in for a world of complexity.



Kinvey's collaborating with Google to simplify this process. We’ve extended our mobile Backend-as-a-Service — a fully-managed, HIPAA compliant platform built on Google Cloud — to developers at healthcare providers, pharmaceutical companies and in life sciences. Our services satisfy the stringent policies of patient privacy as mandated by U.S. Government HIPAA regulations.



Kinvey on GCP provides a decoupled architecture for front-end developers to iterate on their apps and deliver them in an agile manner, without having to wait on backend systems owners to provision connectors to enterprise data and auth systems. Here’s how it works:






  • An app developer starts to build the UI/UX of their app using the front-end programming language or framework of their choice — Android, Objective-C, Swift, Ionic, Xamarin, PhoneGap, etc.

  • The developer downloads the Kinvey SDK for the particular language they're using and uses the appropriate Kinvey SDK to take care of client-side functionality like managing and anonymizing auth tokens, marshaling data between the app and Kinvey’s backend APIs, offline caching and sync and data encryption.

  • The app is wired up to backend functionality by leveraging Kinvey’s backend features, such as an identity service to register/login users, data store to store and retrieve data from the cloud, file store to cache large files like photos and videos, and custom business logic that can be written and provisioned on Kinvey’s Node.js PaaS

  • In the meantime, owners of backend enterprise systems can connect Kinvey to their enterprise auth and data sources, without writing any code. They use Kinvey’s Mobile Identity Connect (MIC) to connect to auth protocols like Active Directory, OpenID, LDAP, SAML, etc. and Kinvey’s RAPID data connectors and custom data links to connect to enterprise data services like Epic, Cerner, SAP and SharePoint. Services provisioned via MIC and RAPID are then made available to the front-end developers by publishing them in Kinvey’s Service Catalog, with appropriate access policies.

  • The front-end developer can then "flip a switch" and instruct Kinvey to use a MIC auth service instead of the default Kinvey auth service, and one or more RAPID services instead of sample data stored in collections in the Kinvey data store.

  • With no front-end app code change, the app then works end-to-end with enterprise auth and data systems.






By providing connectors to Electronic Health Record (EHR) systems like Epic and Cerner, Kinvey makes it easy for developers to launch apps without having to focus on complex enterprise integrations.



Healthcare customers require a HIPAA compliant solution to ensure that patient data is secure end-to-end. Google Cloud Platform’s infrastructure, Cloud Storage and CDN allow us to store and deliver ​data and files in a highly secure and compliant fashion. Specifically, our mBaaS on Google Cloud offers features such as:




  • Plug-in client features for offline caching, network management and RESTful data access to accelerate development

  • Turn-key backend services for data integration, IAM and orchestration for new mobile use cases

  • Microservices for interconnectivity between your enterprise systems

  • Security at every level from mobile client to infrastructure layer

  • Mobile app analytics and reporting for fine-tuning operations




To see how to get started, sign up for Kinvey’s HIPAA compliant mobility platform.



- Posted by Sravish Sridhar, Founder/CEO, Kinvey