May was a busy month for Google Cloud Platform. In case you missed it, here’s the roundup:



BigQuery just got bigger. We made the world’s largest event dataset available publicly. Now, anyone can use BigQuery to run queries over more than 250 million global events in the ...
May was a busy month for Google Cloud Platform. In case you missed it, here’s the roundup:



BigQuery just got bigger. We made the world’s largest event dataset available publicly. Now, anyone can use BigQuery to run queries over more than 250 million global events in the GDELT Event Database. The database is continuing to be updated everyday, so you will always get up-to-date results.



We had a number of substantial product releases. On May 6, we launched the Google Cloud Storage JSON API and the next week, we announced support for MySQL 5.6 on Google Cloud SQL. MySQL 5.6 gives you access to geospatial distance queries, full text indexing in InnoDB tables, online schema changes, and greater performance. On May 23, we made CoreOS images available on Google Compute Engine, further broadening our portfolio of supported Operating Systems. And, this past week we released a new and improved gsutil.



Also in May, App Engine 1.9.5 made its debut, with performance improvements and STARTTLS support for the dev_appserver. Compute Engine got a bit more muscle with new 16-core instances (and it’s cheaper than you think at about $1.30 per hour). Furthermore, Docker fans will be pleased to know that container-optimized images are now available in preview on Google Compute Engine.



If you looked closely at the console, you probably noticed some changes. We launched a new source-editing feature, which allows you to edit your code and commit it to the free private git repo associated with each project. After which, you can now receive email notification when your build and tests finish running. Once your app is deployed, you can easily navigate to the current log entry and easily collaborate with a shareable URL to that entry. Then enable http and https firewalls in Compute Engine instances right from the console with one click. Then, you can get started writing some code with the new native Windows installer for the Cloud SDK, which continues our stream of releases that aim to help Windows users more easily use Cloud Platform.



This month, we also announced that Stackdriver was joining the Cloud Platform team. This Boston-based startup has pushed the envelope when it comes to monitoring and logging for public clouds. Stackdriver’s team is already working side-by-side with Cloud Platform, and we will give you a preview of what we’ve been doing together at Google I/O.



On the blog, we heard from three great customers about how they are using Cloud Platform. Webfilings CTO Jeff Trom wrote about how their small engineering team in Ames, Iowa is able to serve 60% of the Fortune 500. We put together a video about the company when we visited them earlier this year. Wix wrote about how it was able to build a functional disaster recovery cluster in under two weeks on Cloud Platform. Message Bus shared how Google BigQuery helps the company manage over 1 terabyte of data produced each day and make this data usefully available to their users. And, in addition to this, we published 22 new customer case studies on our website, including Evite, feedly, Leanplum and many more.



On top of all of that, we’re continuing to make big strides to ensure that Cloud Platform is environmentally friendly. The Official Google Blog ran a story about how we are using machine learning to drive energy efficiencies in our data centers, and on our blog we wrote about what this means for users of Cloud Platform.



June promises to be a busy month ahead. You’ll hear more from us at DockerCon, GigaOM Structure, and, of course, Google I/O.



-Posted by Benjamin Bechtolsheim, Product Marketing Manager

Today’s guest blog comes from Kalev H. Leetaru, a fellow and adjunct faculty in the Edmund A. Walsh School of Foreign Service at Georgetown University in Washington DC. His award-winning work centers on the application of high performance computing and "big data" to grand challenge problems. ...
Today’s guest blog comes from Kalev H. Leetaru, a fellow and adjunct faculty in the Edmund A. Walsh School of Foreign Service at Georgetown University in Washington DC. His award-winning work centers on the application of high performance computing and "big data" to grand challenge problems.



The entire quarter-billion-record GDELT Event Database is now available as a public dataset in Google BigQuery.



BigQuery is Google’s powerful cloud-based analytical database service, designed for the largest datasets on the planet. It allows users to run fast, SQL-like queries against multi-terabyte datasets in seconds. Scalable and easy to use, BigQuery gives you real-time insights about your data. With the availability of GDELT in BigQuery, you can now access realtime insights about global human society and the planet itself!



You can take it for a spin here. (If it's your first time, you'll have to sign-up to create a Google project, but no credit card or commitment is needed).



The GDELT Project pushes the boundaries of “big data,” weighing in at over a quarter-billion rows with 59 fields for each record, spanning the geography of the entire planet, and covering a time horizon of more than 35 years. The GDELT Project is the largest open-access database on human society in existence. Its archives contain nearly 400M latitude/longitude geographic coordinates spanning over 12,900 days, making it one of the largest open-access spatio-temporal datasets as well.



From the very beginning, one of the greatest challenges in working with GDELT has been in how to interact with a dataset of this magnitude. Few traditional relational database servers offer realtime querying or analytics on data of this complexity, and even simple queries would normally require enormous attention to data access patterns and intricate multi-column indexing to make them possible. Traditional database servers require the creation of indexes over the most-accessed columns to speed queries, meaning one has to anticipate apriori how users are going to interact with a dataset.



One of the things we’ve learned from working with GDELT users is just how differently each of you needs to query and analyze GDELT. The sheer variety of access patterns and the number of permutations of fields that are collected together into queries makes the traditional model of creating a small set of indexes impossible. One of the most exciting aspects of having GDELT available in BigQuery is that it doesn’t have the concept of creating explicit indexes over specific columns – instead you can bring together any ad-hoc combination of columns and query complexity and it still returns in just a few seconds. This means that no matter how you access GDELT, what columns you look across, what kinds of operators you use, or the complexity of your query, you will still see results pretty much in near-realtime.



For us, the most groundbreaking part of having GDELT in BigQuery is that it opens the door not only to fast complex querying and extracting of data, but also allows for the first time real-world analyses to be run entirely in the database. Imagine computing the most significant conflict interaction in the world by month over the past 35 years, or performing cross-tabbed correlation over different classes of relationships between a set of countries. Such queries can be run entirely inside of BigQuery and return in just a handful of seconds. This enables you to try out “what if” hypotheses on global-scale trends in near-real time.



On the technical side, BigQuery is completely turnkey: you just hand it your data and start querying that data – that’s all there is to it. While you could spin up a whole cluster of virtual machines somewhere in the cloud to run your own distributed clustered database service, you would end up spending a good deal of your time being a systems administrator to keep the cluster working and it wouldn’t support BigQuery’s unique capabilities. BigQuery eliminates all of this so all you have to do is focus on using your data, not spending your days running computer servers.



We automatically update the public dataset copy of GDELT in BigQuery every morning by 5AM ET, so you don’t even have to worry about updates – the BigQuery copy always has the latest global events. In a few weeks when GDELT unveils its move from daily updates to updating every 15 minutes, we’ll be taking advantage of BigQuery’s new stream updating capability to ensure the data reflects the state of the world moment-by-moment.



Check out the GDELT blog for future posts where we will showcase how to harness some of BigQuery’s power to perform some pretty incredible analyses, all of them running entirely in the database system itself. For example, we’re particularly excited about the ability to use features like BigQuery’s new Pearson correlation support to be able to search for patterns across the entire quarter-billion-record dataset in just seconds. And we can’t wait to see what you do with it.



Give it a try here. (If it's your first time, you'll have to sign-up to create a Google project, but no credit card nor commitment is needed).



Then copy and paste the query below into the query box and click the bright red “Run Query” button:



SELECT Year, Actor1Name, Actor2Name, Count FROM (
SELECT Actor1Name, Actor2Name, Year, COUNT(*) Count, RANK() OVER(PARTITION BY YEAR ORDER BY Count DESC) rank
FROM
(SELECT Actor1Name, Actor2Name, Year FROM [gdelt-bq:full.events] WHERE Actor1Name < Actor2Name and Actor1CountryCode != '' and Actor2CountryCode != '' and Actor1CountryCode!=Actor2CountryCode), (SELECT Actor2Name Actor1Name, Actor1Name Actor2Name, Year FROM [gdelt-bq:full.events] WHERE Actor1Name > Actor2Name and Actor1CountryCode != '' and Actor2CountryCode != '' and Actor1CountryCode!=Actor2CountryCode),
WHERE Actor1Name IS NOT null
AND Actor2Name IS NOT null
GROUP EACH BY 1, 2, 3
HAVING Count > 100
)
WHERE rank=1
ORDER BY Year



You just processed more than 250 million records detailing worldwide events from the last 30 years and discovered the top defining relationship for each year – all of this in around 6 seconds and for free (BigQuery gives its users a free monthly querying quota). Even more excitingly, you did all of this entirely in BigQuery, showcasing how you can run real analyses entirely in the database itself!



Now try copying and pasting this query and click the “Run Query” button:



SELECT MonthYear MonthYear, INTEGER(norm*100000)/1000 Percent
FROM (
SELECT ActionGeo_CountryCode, EventRootCode, MonthYear, COUNT(1) AS c, RATIO_TO_REPORT(c) OVER(PARTITION BY MonthYear ORDER BY c DESC) norm FROM [gdelt-bq:full.events]
GROUP BY ActionGeo_CountryCode, EventRootCode, MonthYear
)
WHERE ActionGeo_CountryCode='UP' and EventRootCode='14'
ORDER BY ActionGeo_CountryCode, EventRootCode, MonthYear;



Congratulations, you just scanned more than a quarter-billion records to compile every protest in Ukraine that GDELT found in the world’s news media, by month, from 1979 to present. Even more powerfully, BigQuery has converted the raw count of protests per month into a normalized “intensity” measure that accounts for the fact that there is a lot more news media today in 2014 than there was in 1979 and that machine processing of news isn’t perfect.



You can click “Download as CSV” at the top right of the query results table and open the spreadsheet in your favorite spreadsheet program and graph it as a timeline. You can instantly see the “Revolutions of 1989,” the violent “Ukraine without Kuchma” protests of March 2001, the “Orange Revolution” of November 2004, and the Euromaidan protests of November 2013 to present.



Let me know if you come up with a particularly cool query you want to share with the world and we might just add it to the GDELT Blog. We can’t wait to see what you do with GDELT and BigQuery.

Yesterday, Cloud Foundry demonstrated how you can use Cloud Foundry and the BOSH CPI with Google Compute Engine. Since Compute Engine became Generally Available in December, 2013, we've seen an ever increasing number of open source projects, partners, and other software vendors build in support of our platform.
Yesterday, Cloud Foundry demonstrated how you can use Cloud Foundry and the BOSH CPI with Google Compute Engine. Since Compute Engine became Generally Available in December, 2013, we've seen an ever increasing number of open source projects, partners, and other software vendors build in support of our platform.



Cloud Foundry's post covers using BOSH to deploy a Hadoop cluster on Compute Engine and manage it with Cloud Foundry. With Compute Engine's fast and consistent provisioning, Cloud Foundry was able to deploy a working Hadoop cluster in less than 3 minutes! So in a few short minutes, you are able to start your Hadoop processing. When combined with Compute Engine's sub-hour billing and sustained-use discounts, you have multiple options for keeping costs low.



- Posted by Eric Johnson, Program Manager




What do you get if you cross 60 trillion unique URLs, 100 billion monthly search queries, 4.8 million active applications on Google Cloud Platform, and 6.3 trillion monthly operations in Cloud Datastore? One massively scalable global infrastructure, and yet a zero carbon footprint. That is, if you do it the Google way.
What do you get if you cross 60 trillion unique URLs, 100 billion monthly search queries, 4.8 million active applications on Google Cloud Platform, and 6.3 trillion monthly operations in Cloud Datastore? One massively scalable global infrastructure, and yet a zero carbon footprint. That is, if you do it the Google way.



This year’s report from Greenpeace, Clicking Clean: How Companies are Creating the Green Internet, takes a look at how the world’s largest data center operators are contributing to building a green internet. In this report, Google is recognized for innovating new ways to power our data centers. Greenpeace commends Google for investing over $1 billion in 16 renewable energy projects, resulting in 2 GW of clean power. They also recognize us for innovation in energy efficiency in our data centers, as well as the work we are doing to change US energy policy to make renewable energy more accessible.


“Google maintains its leadership in building a renewably powered internet, as it significantly expands its renewable energy purchasing and investment both independently and through collaboration with its utility vendors.”

-- Greenpeace Clicking Clean Report, page 6

Across our data center infrastructure, we work hard to minimize the environmental impact of our services. Through intense energy efficiency efforts, renewable power purchasing, and carbon offset procurement, the net carbon emissions of our global operations is zero.



So how did we get here? From day 1, the Google infrastructure was designed for scalability, security, and efficiency to deliver global search. Today, Google delivers hundreds of global services on our infrastructure, including YouTube, Gmail and Google Cloud Platform. By building a single infrastructure on which we run all our services, we increase the speed at which we can globally drive energy savings through our fleet of data centers. The result of this is a plunging PUE -- a measure of how efficiently data centers operate -- and with PUE, the lower the better.



With a fleet PUE of 1.12, and real-time PUEs less than 1.09, we use far less energy to host services in our data centers than it would take in other data centers. To put this into context, the Uptime Institute estimates that in 2013, PUE averaged 1.65 across the 1000 data center users it surveyed.



We never stop searching for new, cutting edge ways to reduce PUE. Today, we announced that a small group within the data center operations team has come up with a way to use machine learning - a similar approach to those used in speech recognition, self-driving cars and other innovative technology - to improve performance, resulting in lots of new opportunities to save energy.



Another benefit of designing, building, and running our facilities is that we have control over the whole system, so we can optimize cooling choice based on site location. We always choose ‘free cooling’ when we can, whether it’s by using outside air, or alternative water sources such as seawater, industrial canal water, or municipal waste water. When we use water, we ‘recycle’ it by reusing it multiple times, then clean it and return it to the environment in better shape than we got it. In either case, we use natural processes, either air flow or evaporation, to provide cooling to our data centers, rather than mechanical chillers.



So whether you measure ‘greenness’ by how little energy it takes to run our infrastructure, how much renewable energy we’ve enabled, our net carbon emissions, or our rating in the Greenpeace report, you can rest assured that you minimize your environmental impact by choosing Google.



(More information on Google data centers, the engine that hosts Search, Drive, Cloud Platform, YouTube and more can be found on the Google Data Centers and The Big Picture web pages.)



-Posted by Joyce Dickerson, Data Center Sustainability

Today we released gsutil version 4. This release has two new commands that our customers have been asking for:


  • gsutil rsync : The rsync command automates the synchronization of a local file system directory with the contents of a Google Cloud Storage bucket, or across cloud storage buckets or providers.

  • gsutil signurl : The signurl command makes it easy to generate a signed url that can be used to provide secure access to private data for users not signed in with a Google account.

Today we released gsutil version 4. This release has two new commands that our customers have been asking for:


  • gsutil rsync : The rsync command automates the synchronization of a local file system directory with the contents of a Google Cloud Storage bucket, or across cloud storage buckets or providers.

  • gsutil signurl : The signurl command makes it easy to generate a signed url that can be used to provide secure access to private data for users not signed in with a Google account.




Also, if you’ve been thinking about taking the newly released Cloud Storage JSON API for a spin, you’ll be happy to hear that gsutil version 4 now uses the JSON API by default, so trying it out is as simple as upgrading to the latest version of gsutil.



Note: With the move to the Cloud Storage JSON API, some input/output formats have changed from XML to JSON, making gsutil more consistent with other Google Cloud SDK tools. If you use gsutil in scripts, please see the detailed release notes.



You can find instructions for installing gsutil here.



-Posted by Travis Hobrla, Software Engineer

Our guest blog post today comes from Brandon Philips, CTO at CoreOS, a new Linux distribution that has been rearchitected to provide features needed to run massive server deployments.



Google is an organization that fundamentally understands distributed systems, and it's no surprise that Compute Engine is a perfect base for your distributed applications running on CoreOS. The clustering features in CoreOS pair perfectly with VMs that boot quickly and have a super-fast network connecting them.
Our guest blog post today comes from Brandon Philips, CTO at CoreOS, a new Linux distribution that has been rearchitected to provide features needed to run massive server deployments.



Google is an organization that fundamentally understands distributed systems, and it's no surprise that Compute Engine is a perfect base for your distributed applications running on CoreOS. The clustering features in CoreOS pair perfectly with VMs that boot quickly and have a super-fast network connecting them.



Google's wide variety of machine types allow you to create the most efficient cluster for your workloads. By setting machine metadata, CPU intensive or RAM hungry fleet units can be easily scheduled onto a subset of the cluster optimized for that workload.



CoreOS integrates easily with Google load balancers and replica pools to easily scale your applications across regions and zones. Using replica groups with CoreOS is easy; configure the project-level metadata to include a discovery URL and add as many machines as you need. CoreOS will automatically cluster new machines and fleet will begin utilizing them. If a single machine requires more specific configuration, additional cloud-config parameters can be specified during boot.



The largest advantage to running on a cloud platform is access to platform services that can be used in conjunction with your cloud instances. Running on Compute Engine allows you to connect your front-end and back-end services running on CoreOS to a fully managed Cloud Datastore or Cloud SQL database. Applications that store user-generated content on Google Cloud Storage can easily start worker instances on the CoreOS cluster to process items as they are uploaded.



CoreOS uses cloud-config to configure machines after boot and automatically cluster them. Automatic clustering is achieved with a unique discovery token obtained from discovery.etcd.io.

$ curl https://discovery.etcd.io/new

https://discovery.etcd.io/b97f446100a293c8107500e11c34864b


Place this new discovery token into your cloud-config document:

$ cat cloud-config.yaml
#cloud-config

coreos:
etcd:
# generate a new token for each unique cluster from
https://discovery.etcd.io/new
discovery: https://discovery.etcd.io/b97f446100a293c8107500e11c34864b
# multi-region and multi-cloud deployments need to use $public_ipv4
addr: $private_ipv4:4001
peer-addr: $private_ipv4:7001
units:
- name: etcd.service
command: start
- name: fleet.service
command: start


After generating your cloud-config, booting a 3-machine cluster can be done in a single command. Remember to substitute your unique project ID:

gcutil --project=<project-id> addinstance
--image=projects/coreos-cloud/global/images/coreos-beta-310-1-0-v20140508
--persistent_boot_disk --zone=us-central1-a --machine_type=n1-standard-1
--metadata_from_file=user-data:cloud-config.yaml core1 core2 core3


To show off fleet’s scheduling abilities, let’s submit and start a very simple Docker container that echoes a message. First, SSH onto one of the machines in the cluster. Remember to replace the project ID with your own:

$ gcutil --project=coreos ssh --ssh_user=core core1


Create a new unit file on disk that runs our container:

$ cat example.service

[Unit]
Description=MyApp
After=docker.service
Requires=docker.service

[Service]
RemainAfterExit=yes
ExecStart=/usr/bin/docker run busybox /bin/echo 'I was scheduled with fleet!'


To run this unit on your new cluster, submit it via fleetctl:

$ fleetctl start example.service
$ fleetctl list-units

UNIT STATE LOAD ACTIVE SUB DESC MACHINE

example.service launched loaded active exited MyApp b603fc4d.../10.240.246.57


The status of the registry container can easily be fetched via fleetctl:

$ fleetctl status example.service

● example.service - MyApp
Loaded: loaded (/run/fleet/units/example.service; linked-runtime)
Active: active (exited) since Thu 2014-05-22 20:27:54 UTC; 4s ago
Process: 15789 ExecStart=/usr/bin/docker run busybox /bin/echo
I was scheduled with fleet! (code=exited, status=0/SUCCESS)
Main PID: 15789 (code=exited, status=0/SUCCESS)


May 22 20:27:54 core-01 systemd[1]: Started MyApp.
May 22 20:27:57 core-01 docker[15789]: I was scheduled with fleet!


Using this fundamental tooling you can start building full distributed applications on top of CoreOS and Google Compute Engine. Checkout the CoreOS blog for more examples of using fleet, load balancers and more.

For a complete guide on running CoreOS on Google Compute Engine, head over to the docs. To get help or brag about your awesome CoreOS setup, join us on the mailing list or in IRC.

We all have something very valuable that we are afraid to lose. As a cloud-based web development platform, Wix's users entrust both their valuable data and creative process to them for their web presence. Wix absolutely cannot afford to have a ‘Oh no! It’s gone!’ moment. To meet this challenge, Wix replicated their services to Google Cloud Platform. By weaving together their own managed hosting cluster and Google Cloud Platform, the data is replicated across two different systems, making it robust against both data loss and poisoning.
We all have something very valuable that we are afraid to lose. As a cloud-based web development platform, Wix's users entrust both their valuable data and creative process to them for their web presence. Wix absolutely cannot afford to have a ‘Oh no! It’s gone!’ moment. To meet this challenge, Wix replicated their services to Google Cloud Platform. By weaving together their own managed hosting cluster and Google Cloud Platform, the data is replicated across two different systems, making it robust against both data loss and poisoning.



Eugene Olshenbaum, the Head of Media Services group at Wix, wrote a technical case study to describe how Wix replicated their services to Google Cloud Platform, which is available here. Here are some of the highlights from the case study.



Wix chose Google Cloud Platform for replicating their services for the following reasons:


  • Ease of management - App Engine eliminates the need for system management.

  • Scalability - App Engine automatically scales with the volume of the requests.

  • Speed of development - App Engine provides all the technology building blocks for application development.


In the case study, Eugene described the high level architecture of the Wix media serving system and how each component took advantage of the features provided by Google Cloud Platform. He addressed the replication system between the two systems in details as depicted in the following diagram.





In the case study, Eugene also shared some the some of the lessons learned from using Google Cloud Platform, including:


  • Using task queue to handle requests that may take longer than 60 seconds to process

  • Balancing between strong and eventual consistency of Datastore

  • Using memcache to reduce write contention




When it comes to building a robust and fail-safe system, careful planning and design are required. By using Google Cloud Platform, Wix was able to develop a functional system ready for integration test -- all in two weeks. Read more about their solution here.



-Posted by Wally Yau, Cloud Solutions Architect

Google Cloud SQL is a fully managed MySQL service hosted on Google Cloud Platform, providing a database backbone for applications running on Google App Engine or Google Compute Engine. Over the last few months we’ve been very busy adding an SLA, encrypting all your data, enabling point-in-time-recovery and custom flags, and launching instances in Asia (alongside the US and EU).
Google Cloud SQL is a fully managed MySQL service hosted on Google Cloud Platform, providing a database backbone for applications running on Google App Engine or Google Compute Engine. Over the last few months we’ve been very busy adding an SLA, encrypting all your data, enabling point-in-time-recovery and custom flags, and launching instances in Asia (alongside the US and EU).



Today, we’re adding instances running MySQL 5.6, which includes great new features such as geospatial distance queries, full text indexing in InnoDB tables, online schema changes, and a bunch of performance improvements. You can try it now, and there are instructions on migrating existing instances here.



-Posted by Joe Faith, Product Manager

Today’s guest blog comes from Ujjwal Sarin, software engineer at Message Bus, which provides cloud-based email delivery and message infrastructure to boost email open rates and lower the costs for users. Message Bus helps its customers use messaging to build strong relationships with their own customers. ...
Today’s guest blog comes from Ujjwal Sarin, software engineer at Message Bus, which provides cloud-based email delivery and message infrastructure to boost email open rates and lower the costs for users. Message Bus helps its customers use messaging to build strong relationships with their own customers.



Message Bus customers thrive on data – real-time information about who’s opening their emails and who isn’t, and how campaigns are performing over time. We provide highly reliable email delivery that helps improve our customers’ message open rates by as much as 20%, meaning customers can do a better job of promoting their products and services. Google BigQuery frees us from the time-consuming work of managing data, which allows us to put brainpower and more money back into our business.



Feeding data to businesses when and how they want it helps us win business, but it also adds to the workload of storing and managing information. On average, we’d often accumulate about a terabyte of data a day, which took a good deal of time and money to keep track of. To manage the data associated with email delivery, we used a homegrown PostgreSQL-based data warehouse. Before our migration to BigQuery, the majority of our budget was spent on hardware, infrastructure, and personnel.



Building messaging systems is what we’re good at, and we wanted our engineers to spend more time coming up with new ways to tell customers how their campaigns were doing. That meant replacing our homegrown data warehouse with a hosted data backend. We looked at Amazon Redshift and Joyent Manta, but BigQuery won us over due to its rich querying capabilities, practically zero administration for performance, affordable data storage and scalability. BigQuery is built on top of the battle-tested Dremel and Google File System, has a stable API and is inexpensive, and we get infinite scalability for data that can scale when we need it to without having to provision additional capacity.



BigQuery has helped us ramp up the amount and type of data we can supply to our customers. It’s now faster and easier for our customers to access this information, since we built an easy-to-understand user-facing API, with BigQuery serving as the data warehouse.



Prior to BigQuery, we had implementation problems with our Postgres and couldn't serve out large volumes of data with our Postgres cluster. Moving to BQ solved this for us. There’s no limit to the amount of data we can offer – we can even go several months back if customers want to do some historical analysis. Looking in-depth at the history of their campaigns helps businesses make better decisions about how they’ll improve messages in the future. We also give customers access to data streams with real-time information about email delivery, as well as aggregate statistics every hour. For real-time data, we provide webhooks to our users that are triggered from our system to our user endpoints. For aggregate stats, we run an hourly job to compile results from BigQuery into a MySQL table.



To service our feed of customer stats, we periodically run queries on Google BigQuery and store the aggregate counts in a relational database in the cloud. We’re using the BigQuery JSON API. For our feed on customers’ historical data, we proxy requests to Google BigQuery via an internal job manager that handles user quotas, retries and the number of concurrent queries.



When a user queries our system for stats or raw data, we issue an internal GUID (globally unique identifier), which acts as proxy to the query that runs on BigQuery. The user of our API checks the status of the job, and upon successful completion, we proxy the resultset back via our API.



Here is an example of an hourly query triggered by a cron that we use to extract data about open rates and then populate our aggregates database.:

SELECT a.send_hour send_hour,
a.event_hour event_hour,
a.account_id account_id,
a.channel_id channel_id,
a.session_id session_id,
a.event_type event_type,
d.channelKey channel_key,
d.sessionKey session_key,
a.event_count event_count
FROM
(SELECT sendHour send_hour,
INTEGER(UTC_USEC_TO_HOUR (minEventTime*1000)/1000) AS event_hour,
accountId account_id,
channelId channel_id,
sessionId session_id,
'uniqueopen' event_type,
count(1) event_count
FROM
(SELECT min(eventTime) AS minEventTime,
messageGuid,
channelId,
sessionId,
accountId,
sendHour
FROM
WHERE eventType='open'
AND eventTime-sendTime < 1209600000
GROUP EACH BY messageGuid,
channelId,
sessionId,
accountId,
sendHour HAVING minEventTime>=TIMESTAMP_TO_MSEC(TIMESTAMP('20140416')))
GROUP EACH BY send_hour,
event_hour,
account_id,
channel_id,
session_id) a
LEFT JOIN EACH [messagebus_com_prod_ACCOUNT_DATA_DS.ACCOUNT_DETAIL] d ON d.sessionId = a.session_id
AND d.channelId = a.channel_id



Not only is our data more detailed and easier to view, it costs much less, saving us $20,000 a month compared to our homegrown database. We also don’t need an operations and database administrator to manage our database, in addition to the costly database hardware we were running before. So we’re easily saving about $40,000 a month, all told.



Ten months after we started using Google BigQuery, we have three rich data feeds that are available in just minutes when customers request them. We’re making good use of the time we’re no longer spending managing data – in fact, we have fresh ideas in the works for even more data feeds that can drive better decision-making for our customers.



-Contributed by Ujjwal Sarin, software engineer at Message Bus










Today we announced that Stackdriver is joining the Cloud Platform team.  





Stackdriver has built a leading service to help developers intelligently monitor the apps and services they’re building and running in the cloud. This allows customers to have more visibility into errors, performance, behavior, and operations. The teams are going to be working to integrate Stackdriver's great functionality so that Google Cloud Platform customers can take advantage of these new advanced monitoring capabilities.





We’re excited they're joining us. We’ll be investing more in this area in the coming months — stay tuned!





-Posted by Tom Kershaw, Product Manager


Today we are pleased to announce General Availability of the Google Cloud Storage JSON API. This means it is now covered by the Google Cloud Storage Service Level Agreement (SLA) ...
Today we are pleased to announce General Availability of the Google Cloud Storage JSON API. This means it is now covered by the Google Cloud Storage Service Level Agreement (SLA) and deprecation policy. In other words, it’s ready for prime time! This release, v1, replaces the previous, experimental release, v1beta2.



The Google Cloud Storage JSON API provides equivalent functionality to the XML API, but offers better integration with Google’s API client libraries. It also extends functionality by offering features not available in the XML API such as request batching and object change notifications.



-Posted by Brandon Yarbrough, Software Engineer

Today's guest post comes from Michael Dehaan, the original author of Ansible and CTO of Ansible, Inc.



Ansible is an easy to use IT automation platform that provides the ability to deploy applications, configure systems, and orchestrate complex workflows. Ansible has had support for Google Cloud Platform for about a year, and we’d like to share a bit of information about how to use it.
Today's guest post comes from Michael Dehaan, the original author of Ansible and CTO of Ansible, Inc.



Ansible is an easy to use IT automation platform that provides the ability to deploy applications, configure systems, and orchestrate complex workflows. Ansible has had support for Google Cloud Platform for about a year, and we’d like to share a bit of information about how to use it.



Examples are available in our Google Cloud Platform Guide -- as well as several “gc” prefixed modules in our Cloud Module Index.



Users have a lot of choices in picking how they want to manage their cloud infrastructure. When automating in the cloud, some of the most important things to consider are dealing with the challenges of the management stack itself, and how the management stack interacts and scales as the nature of cloud deployments increase in complexity. Not only do you need a system that is easy to maintain, but also a system where the automation content is easy to write and evolve.



Ansible is designed around SSH because SSH is cloud native -- which works well with Google Cloud Platform in many ways.



By using SSH to manage your nodes, there are no additional daemons to install, or security packages to manage. SSH keys can be easily injected upon virtual machine instantiation. No additional resources are consumed on remote nodes, and there’s never the problem of a management agent “falling over” and the user not having a way to automate the box. There are no additional open ports, and when a box is not being managed by Ansible, nothing extra is running. Ansible enables this not by logging into your boxes and running commands, but by connecting to them over SSH to transfer modules, run them, and parse out responses. It then cleans up after itself leaving nothing behind but logs.



One of Ansible’s main focuses is around application deployment of complex multi-tier applications. This includes not only cloud provisioning (as illustrated in the Compute Engine guide) but also working with zero-downtime rolling updates involving load-balanced infrastructure -- which there is also an Ansible module for the Compute Engine Load-Balancer. Instances can either be updated in place, or new machines can be spun up and added to load balancers while others are spun down. Additionally storage and networking can be worked with as well.



Another way that Ansible integrates with Compute Engine, also as documented in the Google Cloud Platform Guide, is the ability to query inventory dynamically from the Google cloud. In getting the most out of cloud, it’s important to treat instances as cattle, not pets. Having groups of machines sorted by tag, rather than instance hostnames, greatly decreases management complexity as scale changes.



If you’d like to find out more about Ansible generally, see docs.ansible.com, and also explore some of the community roles found at galaxy.ansible.com. Ansible Galaxy is a site where users can create and share automation roles with each other, which can help jumpstart new deployments.



If you have specific questions about Ansible and Compute Engine, you may wish to join the Ansible Project Google Group.



-Contributed by Michael DeHaan, original author of Ansible, and CTO of Ansible, Inc.

Today’s guest post is from Jeff Trom, Executive Vice President and Chief Technology Officer at Workiva, a Software-as-a-Service provider that develops cloud-based solutions for business reporting.



At Workiva, we’re reinventing complex business reporting.
Today’s guest post is from Jeff Trom, Executive Vice President and Chief Technology Officer at Workiva, a Software-as-a-Service provider that develops cloud-based solutions for business reporting.



At Workiva, we’re reinventing complex business reporting. Wdesk, our flagship product, is an enterprise solution that is transforming how companies manage and report complex business data. It’s a collective workspace for teams to come together to build documents and reports without having to go to IT for assistance. Using Wdesk, financial teams have quickly become accustomed to how the cloud has simplified collaboration, provided global accessibility and eliminated the replication of data and documents.



What started as an idea to automate SEC reporting has now grown to a robust offering that supports more than 60% of the Fortune 500 in just 4 years since launch. We’ve been able to build a great company and culture where rapid innovation and best-in-class customer service are key.



We rely on Google Cloud Platform to make a formerly onerous process seem easy. Cloud Platform replicates our terabytes of data across multiple datacenters seamlessly and allows our developers to focus on innovation, not infrastructure. We deploy updates daily and leverage Google App Engine’s ability to simultaneously serve traffic from multiple versions to test new features with a few customers before releasing them to everyone. This helps ensure that our customers have the best experience possible each time they log in to Wdesk. And for our business, we’ve also benefitted from $1 million savings due to optimized headcount and improved server efficiency. Check out the video below to learn more about how we’re using Cloud Platform.









In our product space, the data-in-motion architecture we’ve chosen is what sets us apart. It requires:


  • Dynamically scalable application servers to handle variable traffic patterns.

  • Replicated storage that’s scalable and provides reliable performance under load.

  • Enterprise-grade reliability to ensure 24 x 7 access for our customers.




We love learning about how our customers innovate with Wdesk inside their own companies, and are constantly impressed by the solutions they produce. The possibilities seem endless.



-Contributed by Jeff TromExecutive Vice President and Chief Technology Officer at Workiva