Today, Kubernetes 1.4 is available to all Google Container Engine customers. In addition to all the new features Kubernetes 1.4 — including multi-cluster federation, simplified setup and one-command install for popular workloads like MySQL, MariaDB and Jenkins — we’ve also taken big steps to make ...




Today, Kubernetes 1.4 is available to all Google Container Engine customers. In addition to all the new features Kubernetes 1.4 — including multi-cluster federation, simplified setup and one-command install for popular workloads like MySQL, MariaDB and Jenkins — we’ve also taken big steps to make Google Cloud Platform (GCP) the best place to run your Kubernetes workloads.



Container Engine has continued its rapid growth, doubling in usage every 90 days, while still providing a fully managed Kubernetes service with 99.5% uptime for applications large and small. We’ve also made a number of improvements to the platform to make it even easier to manage and more powerful to use:




  • One-click alpha clusters can be spun up as easily as a regular cluster, so testing Kubernetes’ alpha features like persistent application support is a one-click operation.

  • Support for AppArmor in the base image gives applications deployed to Container Engine multiple layers of defense-in-depth.

  • Integration with Kubernetes Cluster Federation allows you to add a Container Engine cluster to your existing federation, greatly simplifying cloud bursting and multi-cloud deployments.

  • Rich support for Google Cloud Identity & Access Management allows you to manage GKE clusters with the same multi-faceted roles you use across your GCP projects.

  • A new Google Container-VM Image makes upgrading a breeze and allows new upgrades to be automatically installed with a simple reboot.

  • Monitoring of all cluster add-ons ensures that all key functions for your cluster are available and ready to use — one less thing to think about when running a large distributed application.




From new startups to the largest organizations, we’ve seen tremendous adoption of Container Engine, here are a few unique highlights:




  • Niantic - creators of the global phenomenon Pokémon GO, relies on Container Engine to power their extraordinary growth.

  • Philips - smart connected lighting system Hue, receives 200 million transactions a day that are easily handled by Container Engine.

  • Google Cloud ML - the new Cloud Machine Learning service from GCP is also running fully on Container Engine.

  • And many more companies, from Box to Pearson, are choosing Kubernetes to manage their production workloads.




As always, if you’d like to help shape the future of Kubernetes, please participate in the Kubernetes community; we’d love to have you! Please join the kubernetes-users-mailing list or kubernetes-users Slack channels.



Finally, if you’ve never tried GCP before, getting started is easy. Sign up for your free trial here.



Thank you for your support!





This spring, we announced Container-VM Image as a beta product under Google Cloud Platform (GCP). If you're a developer interested in deploying your application or a service provider on ...




This spring, we announced Container-VM Image as a beta product under Google Cloud Platform (GCP). If you're a developer interested in deploying your application or a service provider on Google Compute Engine, we recommend taking a few moments to understand how it can help you.



Linux containers help developers to focus on their application without worrying about the underlying infrastructure. A secure and up-to-date base image is a critical building block of any container-based infrastructure. Container-VM Image represents the best practices we here at Google have learned over the past decade running containers at scale.




Container-VM Image design philosophy


Container-VM Image is designed from the ground up to be a modern operating system for running containers on GCP. Read on for more information about the design choices behind Container-VM Image and its attributes.




Build environment


Container-VM Image is based on the open-source Chromium OS project. Chromium OS is a reliable and vetted source code base for this new operating system. In addition, its allows us to use the powerful build and test infrastructure built by the ChromeOS team.




Designed for containers


The Docker container runtime is pre-installed on Container-VM Image. A key feature of containers is that the software dependencies can be packaged in the container image along with the application. With this in mind, Container-VM Image’s root file system is kept to a minimum by only including the software that's necessary to run containers.




More secure by design


Container-VM Image is designed with security in mind, rather than as an afterthought. The minimal root file system keeps the attack surface small. The root file system is mounted as read-only, and its integrity is verified by the kernel during boot up. Such hardening features make it difficult for attackers to permanently exploit the system.




Software updates


Having full control over the build infrastructure combined with a minimal root file system allows us to patch vulnerabilities and ship updated software versions very quickly. Container-VM Image also ships with an optional “in-place update” feature that allows users to stay up-to-date with minimal manual intervention.




Getting started


The Container-VM Images are available in the “google-containers” GCP project. Here are a few commands to get you started:



Here’s how to list currently available images:



$ gcloud compute images list --project google-containers --no-standard-images



Note: All new Container-VM Images have “gci-” prefix in their names.



Here’s how to start a new instance:

$ gcloud compute instances create  \
--zone us-central1-a \
--image-family gci-stable --image-project google-containers



Once the instance is ready, you can ssh into it:



$ gcloud compute ssh  --zone us-central1-a



You can also start an instance using Cloud-Config, the primary API for configuring an instance running Container-VM Image. You can create users, configure firewalls, start Docker containers and even run arbitrary commands required to configure your instance from the Cloud-Config file.



You can specify Cloud-Config as Compute Engine metadata at the time of instance creation with the special `user-data` key:



$ gcloud compute instances create  \
--zone us-central1-a \
--image-family gci-stable --image-project google-containers \
--metadata-from-file user-data=<cloud-config-file>




What’s next


We're working hard on improving and adding new features to Container-VM Image to make it the best way to run containers on GCP.  Stay tuned for future blogs and announcements. In the meantime, you can find more documentation and examples at the Container-VM Image homepage, and send us your feedback at google-containers@google.com.





As we officially move into the Google Cloud era, Google Cloud Platform (GCP) continues to bring new capabilities to more regions, environments, applications and users than ever before. Our goal remains the same: we want to build the most open cloud for all businesses and make it easy for them to build and run great software.




As we officially move into the Google Cloud era, Google Cloud Platform (GCP) continues to bring new capabilities to more regions, environments, applications and users than ever before. Our goal remains the same: we want to build the most open cloud for all businesses and make it easy for them to build and run great software.



Today, we’re announcing new products and services to deliver significant value to our customers. We’re also sharing updates to our infrastructure to improve our ability to not only power Google’s own billion-user products, such as Gmail and Android, but also to power businesses around the world.




Delivering Google Cloud Regions for all


We’ve recently joined the ranks of Google’s billion-user products. Google Cloud Platform now serves over one billion end-users through its customers’ products and services.



To meet this growing demand, we’ve reached an exciting turning point in our geographic expansion efforts. Today, we announced the locations of eight new Google Cloud Regions  Mumbai, Singapore, Sydney, Northern Virginia, São Paulo, London, Finland and Frankfurt  and there are more regions to be announced next year.



By expanding to new regions, we deliver higher performance to customers. In fact, our recent expansion in Oregon resulted in up to 80% improvement in latency for customers. We look forward to welcoming customers to our new Cloud Regions as they become publicly available throughout 2017.




Embracing the multi-cloud world


Not only do applications running on GCP benefit from state-of-the-art infrastructure, but they also run on the latest and greatest compute platforms. Kubernetes, the open source container management system that we developed and open-sourced, reached version 1.4 earlier this week, and we’re actively updating Google Container Engine (GKE) to this new version.



GKE customers will be the first to benefit from the latest Kubernetes features, including the ability to monitor cluster add-ons, one-click cluster spin-up, improved security, integration with Cluster Federation and support for the new Google Container-VM image (GCI).



Kubernetes 1.4 improves Cluster Federation to support straightforward deployment across multiple clusters and multiple clouds. In our support of this feature, GKE customers will be able to build applications that can easily span multiple clouds, whether they are on-prem, on a different public cloud vendor, or a hybrid of both.



We want GCP to be the best place to run your workloads, and Kubernetes is helping customers make the transition. That’s why customers such as Philips Lighting have migrated their most critical workloads to run on GKE.




Accelerating the move to cloud data warehousing and machine learning


Cloud infrastructure exists in the service of applications and data. Data analytics is critical to businesses, and the need to store and analyze data from a growing number of data sources has grown exponentially. Data analytics is also at the foundation for the next wave in business intelligence  machine learning.



The same principles of data analytics and machine learning apply to large-scale businesses: to derive business intelligence from your data, you need access to multiple data sources and the ability to seamlessly process it. That’s why GKE usage doubles every 90 days and is a natural fit for many businesses. Now, we’re introducing new updates to our data analytics and machine learning portfolio that help address this need:




  • Google BigQuery, our fully managed data warehouse, has been significantly upgraded to enable widespread adoption of cloud data analytics. BigQuery support for Standard SQL is now generally available, and we’ve added new features that improve compatibility with more data tools than ever and foster deeper collaboration across your organization with simplified query sharing. We also integrated Identity and Access Management (IAM) that allows businesses to fine-tune their security policies. And to make it accessible for any business to use BigQuery, we now offer unlimited flat-rate pricing that pairs unlimited queries with predictable data storage costs.

  • Cloud Machine Learning is now available to all businesses. Integrated with our data analytics and storage cloud services such as Google BigQuery, Google Cloud Dataflow, and Google Cloud Storage, it helps enable businesses to easily train quality machine learning models on their own data at a faster rate. “Seeing is believing” with machine learning, so we're rolling out dedicated educational and certification programs to help more customers learn about the benefits of machine learning for their organization and give them the tools to put it into use.




To learn more about how to manage data across all of GCP, check out our new Data Lifecycle on GCP paper.




Introducing a new engagement model for customer support


At Google, we understand that the overall reliability and operational health of a customer’s application is a shared responsibility. Today, we’re announcing a new role on the GCP team: Customer Reliability Engineering (CRE). Designed to deepen our partnership with customers, CRE is comprised of Google engineers who integrate with a customer’s operations teams to share the reliability responsibilities for critical cloud applications. This integration represents a new model in which we share and apply our nearly two decades of expertise in cloud computing as an embedded part of a customer's organization. We’ll have more to share about this soon.



One of the CRE model’s first tests was joining Niantic as they launched Pokémon GO, scaling to serve millions of users around the world in a span of a few days.




The Google Cloud GKE/Kubernetes team that supports many of our customers like Niantic

The public cloud is built on customer trust, and we understand that it’s a significant commitment for a customer to entrust a public cloud vendor with their physical infrastructure. By offering new features to help address customer needs and collaborating with them to usher in the future with tools like machine learning, we intend to accelerate the usability of the public cloud and bring more businesses into the Google Cloud fold. Thanks for joining us as we embark toward this new horizon.





Throughout my career as an engineer, I’ve had a hand in numerous product launches that grew to millions of users. User adoption typically happens gradually over several months, with new features and architectural changes scheduled over relatively long periods of time. Never have I taken part in anything close to the growth that ...




Throughout my career as an engineer, I’ve had a hand in numerous product launches that grew to millions of users. User adoption typically happens gradually over several months, with new features and architectural changes scheduled over relatively long periods of time. Never have I taken part in anything close to the growth that Google Cloud customer Niantic experienced with the launch of Pokémon GO.



As a teaser, I’ll start with a picture worth a thousand words:





Our peers in the technical community have asked about the infrastructure that helped bring Pokémon GO to life for millions of players. Niantic and the Google Cloud teams put together this post to highlight some of the key components powering one of the most popular mobile games to date.




A shared fate


At our Horizon event today, we’ll be introducing Google Customer Reliability Engineering (CRE), a new engagement model in which technical staff from Google integrates with customer teams, creating a shared responsibility for the reliability and success of critical cloud applications. Google CRE’s first customer was Niantic, and its first assignment the launch of Pokémon GO — a true test if there ever was one!



Within 15 minutes of launching in Australia and New Zealand, player traffic surged well past Niantic’s expectations. This was the first indication to Niantic’s product and engineering teams that they had something truly special on their hands. Niantic phoned in to Google CRE for reinforcements, in anticipation of the US launch planned the next day. Niantic and Google Cloud — spanning CRE, SRE, development, product, support and executive teams — braced for a flood of new Pokémon Trainers, as Pokémon GO would go on to shatter all prior estimates of player traffic.




Creating the Pokémon game world


Pokémon GO is a mobile application that uses many services across Google Cloud, but Cloud Datastore became a direct proxy for the game’s overall popularity given its role as the game’s primary database for capturing the Pokémon game world. The graph opening this blog post tells the story: the teams targeted 1X player traffic, with a worst-case estimate of roughly 5X this target. Pokémon GO’s popularity quickly surged player traffic to 50X the initial target, ten times the worst-case estimate. In response, Google CRE seamlessly provisioned extra capacity on behalf of Niantic to stay well ahead of their record-setting growth.



Not everything was smooth sailing at launch! When issues emerged around the game’s stability, Niantic and Google engineers braved each problem in sequence, working quickly to create and deploy solutions. Google CRE worked hand-in-hand with Niantic to review every part of their architecture, tapping the expertise of core Google Cloud engineers and product managers — all against a backdrop of millions of new players pouring into the game.




Pokémon powered by containers


Beyond being a global phenomenon, Pokémon GO is one of the most exciting examples of container-based development in the wild. The application logic for the game runs on Google Container Engine (GKE) powered by the open source Kubernetes project. Niantic chose GKE for its ability to orchestrate their container cluster at planetary-scale, freeing its team to focus on deploying live changes for their players. In this way, Niantic used Google Cloud to turn Pokémon GO into a service for millions of players, continuously adapting and improving.



One of the more daring technical feats accomplished by Niantic and the Google CRE team was to upgrade to a newer version of GKE that would allow for more than a thousand additional nodes to be added to its container cluster, in preparation for the highly anticipated launch in Japan. Akin to swapping out the plane’s engine in-flight, careful measures were taken to avoid disrupting existing players, cutting over to the new version while millions of new players signed up and joined the Pokémon game world. On top of this upgrade, Niantic and Google engineers worked in concert to replace the Network Load Balancer, deploying the newer and more sophisticated HTTP/S Load Balancer in its place. The HTTP/S Load Balancer is a global system tailored for HTTPS traffic, offering far more control, faster connections to users and higher throughput overall — a better fit for the amount and types of traffic Pokémon GO was seeing.



The lessons-learned from the US launch — generous capacity provisioning, the architectural swap to the latest version of Container Engine, along with the upgrade to the HTTP/S Load Balancer — paid off when the game launched without incident in Japan, where the number of new users signing up to play tripled the US launch two weeks earlier.




The Google Cloud GKE/Kubernetes team that supports many of our customers like Niantic

Other fun facts


  • The Pokémon GO game world was brought to life using over a dozen services across Google Cloud.

  • Pokémon GO was the largest Kubernetes deployment on Google Container Engine ever. Due to the scale of the cluster and accompanying throughput, a multitude of bugs were identified, fixed and merged into the open source project.

  • To support Pokémon GO’s massive player base, Google provisioned many tens of thousands of cores for Niantic’s Container Engine cluster.

  • Google’s global network helped reduce the overall latency for Pokémon Trainers inhabiting the game’s shared world. Game traffic travels Google’s private fiber network through most of its transit, delivering reliable, low-latency experiences for players worldwide. Even under the sea!


Niantic’s Pokémon GO was an all-hands-on-deck launch that required quick and highly informed decisions across more than a half-dozen teams. The sheer scale and ambition of the game required Niantic to tap architectural and operational best-practices directly from the engineering teams who designed the underlying products. On behalf of the Google CRE team, I can say it was a rare pleasure to be part of such a memorable product launch that created joy for so many people around the world.







I’m a relative newcomer to Google Cloud Platform. After nine years working in Technical Infrastructure, I recently joined the team to work hand-in-hand with customers building out next-generation applications and services on the platform. In this role, I realized that my privileged understanding of how we build our systems can be hard to come by from outside the organization. That is, unless you know where to look.









I’m a relative newcomer to Google Cloud Platform. After nine years working in Technical Infrastructure, I recently joined the team to work hand-in-hand with customers building out next-generation applications and services on the platform. In this role, I realized that my privileged understanding of how we build our systems can be hard to come by from outside the organization. That is, unless you know where to look.



I recently spent a bunch of time hopping around the Google Cloud Networking pages under the main GCP site, looking for materials that could help a customer better understand our approach.





What follows is a series of links for anyone who may want an introduction to Google Cloud Networking, presented in digestible pieces and ordered to build on previous content.




Getting started


First, for some quick 15-minute background, I recommend this Google Cloud Platform Overview. It’s a one-page survey of all the necessary concepts you need to work in Cloud Platform. Then, you may want to scan the related Cloud Platform Services doc, another one-pager that introduces the primary customer-facing services (including networking services) that you might need. It’s not obvious but Cloud Platform networking also lays the foundation for the newer managed services mentioned including Google Container Engine (Kubernetes) and Cloud Dataflow. After all that, you’ll have a good idea of the landscape and be ready to actually do something in GCP!






(click to enlarge)


Networking Codelabs


Google has an entire site devoted to Codelabs — my favorite way to learn nontrivial technical concepts. Within the Cloud Codelabs there are two really excellent networking Codelabs: Networking 101 and Networking 102. I recommend them highly for a few reasons. Each one only takes about 90 minutes end-to-end; each is a quick survey of a few of the most commonly used features in cloud networking; both include really helpful hints about performance and, most importantly, after completing these Codelabs, you’ll have a really good sandbox for experimenting in cloud networking on Google Cloud Platform.






Google Cloud Networking references


Another question you may have is what are the best Google Cloud Networking reference docs? The Google Cloud Networking feature docs are split between two main landing pages: the Cloud Networking Products page and the Compute Engine networking page. The products page introduces the main product feature areas: Cloud Virtual Network, Autoscaling and Load Balancing, Global DNS, Cloud Interconnect and Cloud CDN. Be sure to scroll down to the end, because there are some really valuable links to guides and resources at the very bottom of each page that a lot of people miss out on.



The Compute Engine networking page is a treasure trove of all kinds of interesting details that you won’t find anywhere else. It includes the picture I hold in my mind for how networks and subnetworks are related to regions and zones, details about quotas, default IP ranges, default routes, firewall rules, details about internal DNS, and some simple command line examples using gcloud.





An example of the kind of gem you’ll find on this page is a little blurb on measuring network throughput that links to the PerfKitBenchMarker tool, an open-source benchmark tool for comparing cloud providers (more on that below). I return to this page frequently and find things explained that previously confused me.



For future reference, the Google Cloud Platform documentation also maintains a list of networking tutorials and solutions documents with some really interesting integration topics. And you should definitely check out Google Cloud Platform for AWS Professionals: Networking, an excellent, comprehensive digest of networking features.




Price and performance


Before you do too much, you might want to get a sense for how much of your free quota it will cost you to run through more networking experiments. Get yourself acquainted with the Cloud Platform Pricing page as a reference (notice the “Free credits” link at the bottom of the page). Then, you can find the rest of what you need under Compute Engine Pricing. There, you can see rates for the standard machine types used in the Codelabs, and also a link to General network pricing. A little further down, you’ll find the IP address pricing numbers. Finally, you may find it useful to click through the link at the very bottom to the estimated billing charges invoice page for a summary of what you spent on the codelabs.





Once you’ve done that, you can start thinking about the simple performance and latency tests you completed in the Codelabs. There’s a very helpful discussion on egress throughput caps buried in the Networking and Firewalls doc and you can run your own throughput experiments with PerfKitBenchMarker (sources). This tool does all the heavy lifting with respect to spinning up instances, and understands how different cloud providers define regions, making for relevant comparisons. Also, with PerfKitBenchmaker, someone else has already done the hard work of identifying the accepted benchmarks in various areas.








Real world use cases


Now that you understand the main concepts and features behind Google Cloud Networking, you might want to see how others put them all together. A common first question is how to set things up securely. Securely Connecting to VM Instances is a really good walkthrough that includes more overviews of key topics (firewalls, HTTPS/SSL, VPN, NAT, serial console), some useful gcloud examples and a nice picture that reflects the jumphost setup in the codelab.







Next you should watch two excellent videos from GCP Next 2016: Seamlessly Migrating your Networks to GCP and Load Balancing, Autoscaling & Optimizing Your App Around the Globe. What I like about these videos is that they hit all the high points for how people talk about public cloud virtual networking, and offer examples of common approaches used by large early adopters.



A common question about cloud networking technologies is how to distribute your services around the globe. The Regions and Zones document explains specifically where GCP resources reside, and Google’s research paper Software Defined Networking at Scale (more below) has pretty map-based pictures of Google’s Global CDN and inter-Datacenter WAN that I really like. This Google infrastructure page has zoomable maps with Google’s data centers around the world marked and you can read how Google uses its four undersea cables, with more ‘under’ the horizon, to connect them here.











Finally, you may want to check out this sneaky-useful collection of articles discussing approaches to geographic management of data. I plan to go through the solutions referenced at the bottom of this page to get more good ideas on how to use multiple regions effectively.



Another thing that resonated with me from both GCP Next 2016 videos was the discussion about how easy it is to setup and manage services in GCP to serve from closest, low-latency instances using a single global Anycast VIP. For more on this, the Load Balancing and Scaling concept doc offers a really nice overview of the topic. Then, for some initial exploration of load balancing, check out Setting Up Network Load Balancing.



And in case you were wondering from exactly where Google peers and serves CDN content, visit the Google Edge Network/Peering site and PeeringDB for more details. The peering infrastructure page has zoomable maps where you can see Google’s Edge PoPs and nodes.








Best practices


There’s also a wealth of documents about best practices for Google Cloud Networking. I really like the Best Practices for Networking and Security within the Best Practices for Enterprise Organizations document, and DDoS Best Practices doc provides more useful ways to think about building a global service.



Another key concept to wrap your head around is Cloud Identity & Access Management (IAM). In particular, check out the Understanding Roles doc for its introduction to network- and security-specific roles. Service accounts play a key role here. Understanding Service Accounts walks you through the considerations, and Using IAM Securely offers some best practices checklists. Also, for some insight into where this all leads, check out Access Control for Organizations using IAM [Beta].






A little history of Google Cloud Networking


All this research about Google Cloud Networking may leave you wanting to know more about its history. I checked out the research papers referenced in the previously mentioned video Seamlessly Migrating your Networks to GCP and — warning — they’re deep, but they’ll help you understand the fundamentals of how Google Cloud Networking has evolved over the past decade, and how its highly distributed services deliver the performance and competitive pricing for which it’s known.



Google’s network-related research papers fall into two categories:




Cloud Networking fundamentals




Networking background








The Andromeda network architecture (source)



I hope this post is useful, and that these resources help you better understand the ins and outs of Google Cloud Networking. If you have any other good resources, be sure to share them in the comments.








Building IoT products and solutions involves stitching together a whole range of complex technologies, from devices to applications. With a new direct integration between ...




Building IoT products and solutions involves stitching together a whole range of complex technologies, from devices to applications. With a new direct integration between Particle, an IoT cloud platform and hardware provider, and Google Cloud Platform (GCP), you can now easily bring that data to big data tools such as Google Cloud Dataflow, our batch and streaming big data processing service; Google BigQuery, our managed data analytics warehouse and others.



A growing list of devices support the Particle platform, making it easy for organizations developing IoT applications to manage devices, perform firmware updates and acquire and send field data to the internet through a range of connectivity options.




You can now connect to GCP from the Particle platform developer console.

To begin, connect your Particle project to a Google Cloud Pub/Sub topic. Cloud Pub/Sub lets you decouple the device data ingest stream from different downstream subscribers, durably storing the data in as it arrives for up to seven days while it's processed. By granting limited permissions to Particle to publish to a specific Cloud Pub/Sub topic, you can properly isolate the data ingest portion of your IoT application. You can then use Cloud DataFlow to operate on a multi-device, time-windowed stream of events in near-real-time, or dispatch and store this data to a number of storage options. For example, storing data long-term in BigQuery and Google Cloud Storage lets you affordably record a long history of device information, against which you can later perform various analytics or train machine learning models to make scenario-based decisions. You can then call Particle Cloud APIs to take action on devices back in the world.



With this integration, we believe developers and product builders will be able to bring production-quality products to market faster, blending the Particle device ecosystem and platform with GCP's scalable and innovative data solutions. To get started, check out the tutorial on the Particle website and connect device data directly to your GCP project today.





Many of today's most successful games are played in small sessions on the devices in our pockets. Players expect to open the game app from any of their supported devices and find themselves right where they left off. In addition, players may be very sensitive to delays caused by waiting for the game to save their progress during play. For mobile game developers, all of this adds up to the need for a persistent data store that can be accessed with consistently low latency.




Many of today's most successful games are played in small sessions on the devices in our pockets. Players expect to open the game app from any of their supported devices and find themselves right where they left off. In addition, players may be very sensitive to delays caused by waiting for the game to save their progress during play. For mobile game developers, all of this adds up to the need for a persistent data store that can be accessed with consistently low latency.



Game developers with database experience are usually most comfortable with relational databases as their backend game state storage. MySQL, with its ACID-compliant transactions and well-understood semantics offers a known pattern. However, "game developer" and "database administrator" are different titles for a reason; game developers may not relish standing up and administering a database when they could be building new game content and features. That’s why Google Cloud Platform offers high-performance, fully-managed MySQL instances in the form of Google Cloud SQL Second Generation to help handle your mobile game's persistent storage.



Many game developers ask for guidance about how much player load (concurrent users in a game) Cloud SQL can handle. In order to provide a starting point for these discussions, we recently published a new solutions document that details a simple mock game stress-testing framework built on Google Cloud Platform and Cloud SQL Second Generation. For a data model, we looked to the data schema and access patterns of popular massively single-player social games such as Puzzle and Dragons™ or Monster Strike™ for our testing framework. We also made the source code for the framework available so you can have a look at whether the simulated gameplay patterns and the data model are similar to your game’s. The results should provide a starting point for deciding if Cloud SQL Second Generation's performance is the right fit for your next game project's concurrent user estimates.



For more information about Cloud SQL Second Generation, have a look at the documentation. If you'd like to see more solutions, check out the gaming solutions page.








At Google I/O this May, Firebase announced a new suite of products to help developers build mobile apps. Firebase Analytics, a part of the new Firebase platform, is a tool that automatically captures data on how people are using your iOS and Android app, and lets you define your own custom app events. When the data's captured, it’s available through a dashboard in the Firebase console. One of my favorite cloud integrations with the new Firebase platform is the ability to export raw data from Firebase Analytics to ...




At Google I/O this May, Firebase announced a new suite of products to help developers build mobile apps. Firebase Analytics, a part of the new Firebase platform, is a tool that automatically captures data on how people are using your iOS and Android app, and lets you define your own custom app events. When the data's captured, it’s available through a dashboard in the Firebase console. One of my favorite cloud integrations with the new Firebase platform is the ability to export raw data from Firebase Analytics to Google BigQuery for custom analysis. This custom analysis is particularly useful for aggregating data from the iOS and Android versions of your app, and accessing custom parameters passed in your Firebase Analytics events. Let’s take a look at what you can do with this powerful combination.




How does the BigQuery export work?




After linking your Firebase project to BigQuery, Firebase automatically exports a new table to an associated BigQuery dataset every day. If you have both iOS and Android versions of your app, Firebase exports the data for each platform into a separate dataset. Each table contains the user activity and demographic data automatically captured by Firebase Analytics, along with any custom events you’re capturing in your app. Thus, after exporting one week’s worth of data for a cross-platform app, your BigQuery project would contain two datasets, each with seven tables:








Diving into the data




The schema for every Firebase Analytics export table is the same, and we’ve created two datasets (one for iOS and one for Android) with sample user data for you to run the example queries below. The datasets are for a sample cross-platform iOS and Android gaming app. Each dataset contains seven tables  one week’s worth of analytics data.



The following query will return some basic user demographic and device data for one day of usage on the iOS version of our app:



SELECT
user_dim.app_info.app_instance_id,
user_dim.device_info.device_category,
user_dim.device_info.user_default_language,
user_dim.device_info.platform_version,
user_dim.device_info.device_model,
user_dim.geo_info.country,
user_dim.geo_info.city,
user_dim.app_info.app_version,
user_dim.app_info.app_store,
user_dim.app_info.app_platform
FROM
[firebase-analytics-sample-data:ios_dataset.app_events_20160601]



Since the schema for every BigQuery table exported from Firebase Analytics is the same, you can run any of the queries in this post on your own Firebase Analytics data by replacing the dataset and table names with the ones for your project.



The schema has user data and event data. All user data is automatically captured by Firebase Analytics, and the event data is populated by any custom events you add to your app. Let’s take a look at the specific records for both user and event data.




User data




The user records contain a unique app instance ID for each user (user_dim.app_info.app_instance_id in the schema), along with data on their location, device and app version. In the Firebase console, there are separate dashboards for the app’s Android and iOS analytics. With BigQuery, we can run a query to find out where our users are accessing our app around the world across both platforms. The query below makes use of BigQuery’s union feature, which lets you use a comma as a UNION ALL operator. Since a row is created in our table for each bundle of events a user triggers, we use EXACT_COUNT_DISTINCT to make sure each user is only counted once:

SELECT
user_dim.geo_info.country as country,
EXACT_COUNT_DISTINCT( user_dim.app_info.app_instance_id ) as users
FROM
[firebase-analytics-sample-data:android_dataset.app_events_20160601],
[firebase-analytics-sample-data:ios_dataset.app_events_20160601]
GROUP BY
country
ORDER BY
users DESC



User data also includes a user_properties record, which includes attributes you define to describe different segments of your user base, like language preference or geographic location. Firebase Analytics captures some user properties by default, and you can create up to 25 of your own.



A user’s language preference is one of the default user properties. To see which languages our users speak across platforms, we can run the following query:



SELECT
user_dim.user_properties.value.value.string_value as language_code,
EXACT_COUNT_DISTINCT(user_dim.app_info.app_instance_id) as users,
FROM
[firebase-analytics-sample-data:android_dataset.app_events_20160601],
[firebase-analytics-sample-data:ios_dataset.app_events_20160601]
WHERE
user_dim.user_properties.key = "language"
GROUP BY
language_code
ORDER BY
users DESC




Event data




Firebase Analytics makes it easy to log custom events such as tracking item purchases or button clicks in your app. When you log an event, you pass an event name and up to 25 parameters to Firebase Analytics and it automatically tracks the number of times the event has occurred. The following query shows the number of times each event in our app has occurred on Android for a particular day:



SELECT 
event_dim.name,
COUNT(event_dim.name) as event_count
FROM
[firebase-analytics-sample-data:android_dataset.app_events_20160601]
GROUP BY
event_dim.name
ORDER BY
event_count DESC



If you have another type of value associated with an event (like item prices), you can pass it through as an optional value parameter and filter by this value in BigQuery. In our sample tables, there is a spend_virtual_currency event. We can write the following query to see how much virtual currency players spend at one time:



SELECT 
event_dim.params.value.int_value as virtual_currency_amt,
COUNT(*) as num_times_spent
FROM
[firebase-analytics-sample-data:android_dataset.app_events_20160601]
WHERE
event_dim.name = "spend_virtual_currency"
AND
event_dim.params.key = "value"
GROUP BY
1
ORDER BY
num_times_spent DESC




Building complex queries




What if we want to run a query across both platforms of our app over a specific date range? Since Firebase Analytics data is split into tables for each day, we can do this using BigQuery’s TABLE_DATE_RANGE function. This query returns a count of the cities users are coming from over a one week period:



SELECT
user_dim.geo_info.city,
COUNT(user_dim.geo_info.city) as city_count
FROM
TABLE_DATE_RANGE([firebase-analytics-sample-data:android_dataset.app_events_], DATE_ADD('2016-06-07', -7, 'DAY'), CURRENT_TIMESTAMP()),
TABLE_DATE_RANGE([firebase-analytics-sample-data:ios_dataset.app_events_], DATE_ADD('2016-06-07', -7, 'DAY'), CURRENT_TIMESTAMP())
GROUP BY
user_dim.geo_info.city
ORDER BY
city_count DESC



We can also write a query to compare mobile vs. tablet usage across platforms over a one week period:



SELECT
user_dim.app_info.app_platform as appPlatform,
user_dim.device_info.device_category as deviceType,
COUNT(user_dim.device_info.device_category) AS device_type_count FROM
TABLE_DATE_RANGE([firebase-analytics-sample-data:android_dataset.app_events_], DATE_ADD('2016-06-07', -7, 'DAY'), CURRENT_TIMESTAMP()),
TABLE_DATE_RANGE([firebase-analytics-sample-data:ios_dataset.app_events_], DATE_ADD('2016-06-07', -7, 'DAY'), CURRENT_TIMESTAMP())
GROUP BY
1,2
ORDER BY
device_type_count DESC



Getting a bit more complex, we can write a query to generate a report of unique user events across platforms over the past two weeks. Here we use PARTITION BY and EXACT_COUNT_DISTINCT to de-dupe our event report by users, making use of user properties and the user_dim.user_id field:



SELECT 
STRFTIME_UTC_USEC(eventTime,"%Y%m%d") as date,
appPlatform,
eventName,
COUNT(*) totalEvents,
EXACT_COUNT_DISTINCT(IF(userId IS NOT NULL, userId, fullVisitorid)) as users
FROM (
SELECT
fullVisitorid,
openTimestamp,
FORMAT_UTC_USEC(openTimestamp) firstOpenedTime,
userIdSet,
MAX(userIdSet) OVER(PARTITION BY fullVisitorid) userId,
appPlatform,
eventTimestamp,
FORMAT_UTC_USEC(eventTimestamp) as eventTime,
eventName
FROM FLATTEN(
(
SELECT
user_dim.app_info.app_instance_id as fullVisitorid,
user_dim.first_open_timestamp_micros as openTimestamp,
user_dim.user_properties.value.value.string_value,
IF(user_dim.user_properties.key = 'user_id',user_dim.user_properties.value.value.string_value, null) as userIdSet,
user_dim.app_info.app_platform as appPlatform,
event_dim.timestamp_micros as eventTimestamp,
event_dim.name AS eventName,
event_dim.params.key,
event_dim.params.value.string_value
FROM
TABLE_DATE_RANGE([firebase-analytics-sample-data:android_dataset.app_events_], DATE_ADD('2016-06-07', -7, 'DAY'), CURRENT_TIMESTAMP()),
TABLE_DATE_RANGE([firebase-analytics-sample-data:ios_dataset.app_events_], DATE_ADD('2016-06-07', -7, 'DAY'), CURRENT_TIMESTAMP())
), user_dim.user_properties)
)
GROUP BY
date, appPlatform, eventName



If you have data in Google Analytics for the same app, it’s also possible to export your Google Analytics data to BigQuery and do a JOIN with your Firebase Analytics BigQuery tables.




Visualizing analytics data




Now that we’ve gathered new insights from our mobile app data using the raw BigQuery export, let’s visualize it using Google Data Studio. Data Studio can read directly from BigQuery tables, and we can even pass it a custom query like the ones above. Data Studio can generate many different types of charts depending on the structure of your data, including time series, bar charts, pie charts and geo maps.



For our first visualization, let’s create a bar chart to compare the device types from which users are accessing our app on each platform. We can paste the mobile vs. tablet query above directly into Data Studio to generate the following chart:



From this chart, it’s easy to see that iOS users are much more likely to access our game from a tablet. Getting a bit more complex, we can use the above event report query to create a bar chart comparing the number of events across platforms:



Check out this post for detailed instructions on connecting your BigQuery project to Data Studio.




What’s next?


If you’re new to Firebase, get started here. If you’re already building a mobile app on Firebase, check out this detailed guide on linking your Firebase project to BigQuery. For questions, take a look at the BigQuery reference docs and use the firebase-analytics and google-bigquery tags on Stack Overflow. And let me know if there are any particular topics you’d like me to cover in an upcoming post.






Historical daily weather data from the Global Historical Climate Network (GHCN) is now available in Google BigQuery, our managed analytics data warehouse. The data comes from over 80,000 stations in 180 countries, spans several decades and has been quality-checked to ensure that it's ...




Historical daily weather data from the Global Historical Climate Network (GHCN) is now available in Google BigQuery, our managed analytics data warehouse. The data comes from over 80,000 stations in 180 countries, spans several decades and has been quality-checked to ensure that it's temporally and spatially consistent. The GHCN daily data is the official weather record in the United States.



According to the National Center for Atmospheric Research (NCAR), routine weather events such as rain and unusually warm and cool days directly affect 3.4% of the US Gross Domestic Product, impacting everyone from ice-cream stores, clothing retailers, delivery services, farmers, resorts and business travelers. The NCAR estimate considers routine weather only  it doesn’t take into account, for example, how weather impacts people’s moods, nor the impact of destructive weather such as tornadoes and hurricanes. If you analyze data to make better business decisions (or if you build machine learning models to provide such guidance automatically), weather should be one of your inputs.



The GHCN data has long been freely available from the National Oceanic and Atmospheric Association (NOAA) website to download and analyze. However, because the dataset changes daily, anyone wishing to analyze that data over time would need to repeat the process the following day. Having the data already loaded and continually refreshed in BigQuery makes it easier for researchers and data scientists to incorporate weather information in analytics and machine learning projects. The fact that BigQuery analysis can be done using standard SQL makes it very convenient to start analyzing the data.



Let’s explore the GHCN dataset and how to interact with it using BigQuery.




Where are the GHCN weather stations?




The GHCN data is global. For example, let’s look at all the stations from which we have good minimum-temperature data on August 15, 2016:



SELECT
name,
value/10 AS min_temperature,
latitude,
longitude
FROM
[bigquery-public-data:ghcn_d.ghcnd_stations] AS stn
JOIN
[bigquery-public-data:ghcn_d.ghcnd_2016] AS wx
ON
wx.id = stn.id
WHERE
wx.element = 'TMIN'
AND wx.qflag IS NULL
AND STRING(wx.date) = '2016-08-15'




This returns:



By plotting the station locations in Google Cloud Datalab, we notice that the density of stations is very good in North America, Europe and Japan and quite reasonable in most of Asia. Most of the gaps correspond to sparsely populated areas such as the Australian outback, Siberia and North Africa. Brazil is the only gaping hole. (For the rest of this post, I’ll show only code snippets  for complete BigQuery queries and Python plotting commands, please see the full Datalab notebook on github.)




Blue dots represent GHCN weather stations around the world.


Using GHCN weather data in your applications


Here’s a simple example of how to incorporate GHCN data into an application. Let’s say you're a pizza chain based in Chicago and want to explore some weather variables that might affect demand for pizza and pizza delivery times. The first thing to do is to find the GHCN station closest to you. You go to Google Maps and find that your latitude and longitude is 42 degrees latitude and -87.9 degrees longitude, and run a BigQuery query that computes the great-circle distance between a station and (42, -87.9) to get the distance from your pizza shop in kilometers (see the Datalab notebook for what this query looks like). The result looks like this:



Plotting these on a map, you can see that there are a lot of GHCN stations near Chicago, but our pizza shop needs data from station USW00094846 (shown in red) located at O’Hare airport, 3.7 km away from our shop.



Next, we need to pull the data from this station on the dates of interest. Here, I'll query the table of 2015 data and pull all the days from that table. To get the rainfall amount (“precipitation” or PRCP) in millimeters, you’d write:


SELECT
wx.date,
wx.value/10.0 AS prcp
FROM
[bigquery-public-data:ghcn_d.ghcnd_2015] AS wx
WHERE
id = 'USW00094846'
AND qflag IS NULL
AND element = 'PRCP'
ORDER BY wx.date



Note that we divide wx.value by 10 because the GHCN reports rainfall in tenths of millimeters. We ensure that the quality-control flag (qflag) associated with the data is null, indicating that the observation passed spatio-temporal quality-control checks.



Typically, though, you’d want a few more weather variables. Here’s a more complete query that pulls rainfall amount, minimum temperature, maximum temperature and the presence of some weather phenomenon (fog, hail, rain, etc.) on each day:


SELECT
wx.date,
MAX(prcp) AS prcp,
MAX(tmin) AS tmin,
MAX(tmax) AS tmax,
IF(MAX(haswx) = 'True', 'True', 'False') AS haswx
FROM (
SELECT
wx.date,
IF (wx.element = 'PRCP', wx.value/10, NULL) AS prcp,
IF (wx.element = 'TMIN', wx.value/10, NULL) AS tmin,
IF (wx.element = 'TMAX', wx.value/10, NULL) AS tmax,
IF (SUBSTR(wx.element, 0, 2) = 'WT', 'True', NULL) AS haswx
FROM
[bigquery-public-data:ghcn_d.ghcnd_2015] AS wx
WHERE
id = 'USW00094846'
AND qflag IS NULL )
GROUP BY
wx.date
ORDER BY
wx.date



The query returns rainfall amounts in millimeters, maximum and minimum temperatures in degrees Celsius and a column that indicates whether there was impactful weather on that day:



You can cast the results into a Pandas DataFrame and easily graph them in Datalab (see notebook in github for queries and plotting code):




BigQuery Views and Data Studio 360 dashboards


Since the previous query pivoted and transformed some fields, you can save the query as a View. Simply copy-paste this query into the BigQuery console and select “Save View”:



SELECT
REPLACE(date,"-","") AS date,
MAX(prcp) AS prcp,
MAX(tmin) AS tmin,
MAX(tmax) AS tmax
FROM (
SELECT
STRING(wx.date) AS date,
IF (wx.element = 'PRCP', wx.value/10, NULL) AS prcp,
IF (wx.element = 'TMIN', wx.value/10, NULL) AS tmin,
IF (wx.element = 'TMAX', wx.value/10, NULL) AS tmax
FROM
[bigquery-public-data:ghcn_d.ghcnd_2016] AS wx
WHERE
id = 'USW00094846'
AND qflag IS NULL
AND value IS NOT NULL
AND DATEDIFF(CURRENT_DATE(), date) < 15 )
GROUP BY
date
ORDER BY
date ASC



Notice my use of DATEDIFF and CURRENT_DATE functions to get weather data from the past two weeks. Saving this query as a View allows me to query and visualize this View as if it were a BigQuery table.



Since visualization is on my mind, I can go over to Data Studio and easily create a dashboard from this View, for example:



One thing to keep in mind is that the "H" in GHCN stands for historical. This data is not real-time, and there's a time lag. For example, although I did this query on August 25, the latest data shown is from August 22.




Mashing datasets in BigQuery


It’s quite easy to execute a weather query from your analytics program and merge the result with other corporate data.



If that other data is on BigQuery, you can combine it all in a single query! For example, another BigQuery dataset that’s publicly available is airline on-time arrival data. Let’s mash the GHCN and on-time arrivals datasets together:


SELECT
wx.date,
wx.prcp,
f.departure_delay,
f.arrival_airport
FROM (
SELECT
STRING(date) AS date,
value/10 AS prcp
FROM
[bigquery-public-data:ghcn_d.ghcnd_2005]
WHERE
id = 'USW00094846'
AND qflag IS NULL
AND element = 'PRCP') AS wx
JOIN
[bigquery-samples:airline_ontime_data.flights] AS f
ON
f.date = wx.date
WHERE
f.departure_airport = 'ORD'
LIMIT 100



This yields a table with both flight delay and weather information:



We can look at the distributions in Datalab using the Python package Seaborn:



As expected, the heavier the rain, the more the distribution curves shift to the right, indicating that flight delays increase.



GHCN data in BigQuery democratizes weather data and opens it up to all sorts of data analytics and machine learning applications. We can’t wait to see how you use this data to build what’s next.








There’s a cool new setting in the storage dialog of Cloud SQL Second Generation: “Enable automatic storage increase.” When selected, it checks the available database storage every 30 seconds and adds more capacity as needed in 5GB to 25GB increments, depending on the size of the database. This means that instead of having to provision storage to accommodate future database growth, storage capacity grows as the database grows.




There’s a cool new setting in the storage dialog of Cloud SQL Second Generation: “Enable automatic storage increase.” When selected, it checks the available database storage every 30 seconds and adds more capacity as needed in 5GB to 25GB increments, depending on the size of the database. This means that instead of having to provision storage to accommodate future database growth, storage capacity grows as the database grows.



There are two key benefits to Cloud SQL automatic storage increases:




  1. Having a database that grows as needed can reduce application downtime by reducing the risk of running out of database space. You can take the guesswork out of capacity sizing without incurring any downtime or performing database maintenance.

  2. If you're managing a growing database, automatic storage increases can save a considerable amount of money. That’s because allocated database storage grows as needed rather than you having to provision a lot of space upfront. In other words, you pay for only what you use plus a small margin.




According to the documentation, Cloud SQL determines how much capacity to add in the following way: “The size of the threshold and the amount of storage that is added to your instance depends on the amount of storage currently provisioned for your instance, up to a maximum size of 25 GB. The current storage capacity is divided by 25, and the result rounded down to the nearest integer. This result is added to 5 GB to produce both the threshold size and the amount of storage that is added in the event that the available storage falls below the threshold.”



Expressed as a JavaScript formula, that translates to the following (units=GB):



Math.min((Math.floor(currentCapacity/25) + 5),25)



Here’s what that looks like for a few database sizes:













Current capacity


Threshold


Amount auto-added


50GB


7GB


7GB


100GB


9GB


9GB


250GB


15GB


15GB


500GB


25GB


25GB


1000GB


25GB


25GB


5000GB


25GB


25GB







If you already have a database instance running on Cloud SQL Second generation, you can go ahead and turn this feature on now.