Since releasing Cloud Trace for Google Cloud Platform beta last year, thousands of developers have been using the service to improve their applications’ performance. Today, we’re adding more features and functionality, based on feedback from our beta users.
Since releasing Cloud Trace for Google Cloud Platform beta last year, thousands of developers have been using the service to improve their applications’ performance. Today, we’re adding more features and functionality, based on feedback from our beta users.



Here’s a shortlist of the latest updates:




1. Automatic tracing and performance analysis for all App Engine projects


Cloud Trace now automatically instruments Google App Engine applications. It continuously evaluates all App Engine requests and periodically analyzes each endpoint-level traces to identify performance bottlenecks and insights. It looks for suboptimal patterns in RPC calls and provides recommendations to fix them.





Here's how it works:



Cloud Trace currently analyzes billions of traces daily and generates millions of reports. It continuously inspects your application requests for a number of things such as memcache size, datastore batch size, cursors usage and other data and it looks for opportunities to optimize your application performance.



For example, using a cursor to make datastore queries may be more optimal than using an offset when the offset is large. When Cloud Trace observes a call pattern with an offset that slows down the application, it surfaces an insight with a recommendation to use cursors. We're continuously refining existing insights and adding new insights to Cloud Trace to provide accurate and actionable suggestions.




2. Latency shift detection


Cloud Trace builds analysis reports for the most frequently used endpoints, and now we can use these reports to determine changes in your application’s latency. Using our latency shift detection algorithms, we can surface significant and minor changes in your application latencies when there's a noticeable change. You can access this feature directly from the Analysis Reports tab within Cloud Trace.






3. Use Trace API to trace custom workloads


If you have your own custom workloads that you wish to trace and analyze, the Cloud Trace API and Trace SDK can now be used to optimize the performance of those workloads. The Cloud Trace API can be used to add custom spans to a trace. A span represents a unit of work within a trace, such as an RPC request or a section of code. For custom workloads, you can define your custom start and end of a span using the Cloud Trace SDK. This data is uploaded to Cloud Trace, where you can leverage all the Trace Insights and Analytics features mentioned above. Currently Cloud Trace SDK is available for Java 1 and Node.js. A REST API is available for all other languages.




4. Intuitive UI with a focus on developer workflows


The new Trace Overview page brings together all the Cloud Trace goodness into a simple unified view. The page summarizes the various performance insights gleaned from your application traces. It also summarizes the latencies for each application endpoint and the latencies associated with your most frequently called RPCs. The latest analysis report is readily available on the overview page.







Many users have often expressed a desire to analyze Traces when viewing logs or when viewing the health of App Engine projects. Trace is now accessible from the Logs Viewer in Cloud Console and from the App Engine dashboard. If you're using Google Cloud Monitoring, then you'll now be able to navigate to a relevant and filtered set of Traces from your monitoring dashboard.





Stay tuned for more Cloud Trace improvements over the coming weeks, including expanded support for Google Compute Engine and Google Container Engine as well as more detailed auto analysis and insights. As always, we love direct feedback and will be monitoring Stack Overflow for issues and suggestions.



If you haven’t tried Cloud Trace to optimize your application performance, go ahead and give it a try! If you're attending GCP NEXT 2016, check out the talk “Diagnostics - Spend less time diagnosing and more time developing” to learn more about Cloud Trace and all the other development tools available on Google Cloud Platform to manage your applications.



- Posted by Sharat Shroff, Product Manager, Google Cloud Platform







1 Java is the registered trademark of Oracle and/or its affiliates.

The Google Cloud Platform team is constantly updating and releasing new products, and sometimes there isn’t enough time in the day to sit down in front of a screen and read through everything.
The Google Cloud Platform team is constantly updating and releasing new products, and sometimes there isn’t enough time in the day to sit down in front of a screen and read through everything.



We’re fixing that with the Google Cloud Platform Podcast  where you can listen to all the new and exciting things that are happening on Google Cloud during your commute, at the gym, mowing the lawn, cooking dinner, or whenever you feel the desire to learn something new about our platform.




Google Cloud Platform Developer Advocates (left to right) Mark Mandel and Francesc Campoy

















This weekly production is hosted by myself, Mark Mandel and my partner in crime Francesc Campoy. We are two members of the Google Cloud Platform Developer Advocacy team.



The show includes a weekly news roundup and community contributed questions, deep dives into interesting technical topics, such as Big Data, Kubernetes and HTTP/2, interviews with Google Cloud product managers  most recently Ram Ramanathan for the Cloud Vision API and Chris Sells for Developer Experience  and conversations with customers such as Shine Technologies to see how they're using Cloud Platform in the wild.



You can subscribe to the podcast to listen to all the episodes already published as well as those to come via RSS and iTunes.



We’ve got big plans for more upcoming episodes, so stay tuned!



- Posted by Mark Mandel, Developer Advocate and Francesc Campoy Flores, Developer Advocate

It’s not every day you move a 75 million+ user company from a home-grown infrastructure to the cloud. But if you use Spotify, more and more of your musical experience will be delivered by ...
It’s not every day you move a 75 million+ user company from a home-grown infrastructure to the cloud. But if you use Spotify, more and more of your musical experience will be delivered by Google Cloud Platform over the coming weeks and months  we’re partnering on an ambitious project to move Spotify’s backend into GCP.



Spotify aims to make music special for everyone. Today, the company hosts more than 2 billion playlists and gives consumers access to more than 30 million songs. Users can search for music across any device by artist, album, genre, playlist or record label, while features like Discover Weekly suggest personalized playlists for millions of people around the world.



While Spotify had engineers running its core infrastructure and buying or leasing data-center space, PC hardware and networking gear to provide a seamless experience for users — time and again it asked whether the tradeoff of resources that could otherwise focus on innovative features and software, was worth it.



Recently Spotify decided it didn’t want to be in the data center business, and chose Cloud Platform over the public cloud competition after careful review and testing. The company split their migration to Cloud Platform into two streams: a services track and a data track. Spotify runs their products on a multitude of tiny microservices, several of which are now being moved from on premise data centers into Google’s cloud using our Cloud Storage, Compute Engine and other products.



With Compute Engine, teams can rely on consistent performance from ultra high IOPS SSD and local SSD storage capabilities. And with autoscaling, they can build resilient and cost-efficient applications that use just the right amount of resources necessary at any given time. For storage, Spotify is now implementing Google Cloud Datastore and Google Cloud Bigtable. This rich fabric of storage services lets engineers work on complex back end logic, instead of focusing on how to store the data and maintain databases. Spotify is also deploying Google’s Cloud Networking services, such as Direct Peering, Cloud VPN and Cloud Router, to transfer petabytes of data. This results in a fast, reliable and secure experience for users around the globe.



On the data side of things, the company is adopting an entirely new technology stack. This includes moving from Hadoop, MapReduce, Hive and a series of home-grown dashboarding tools, to adopting the latest in data processing tools, including Google Cloud Pub/Sub, Google Cloud Dataflow, Google BigQuery, and Google Cloud Dataproc.



With BigQuery and Cloud Dataproc, data teams can run complex queries and get answers in a minute or two, rather than hours. This lets Spotify perform more frequent in-depth, interactive analysis, guiding product development, feature testing and more intelligent user-facing features. To gather and forward all events to its ecosystem, Spotify is using Cloud Pub/Sub, Google’s global service for messaging and streaming data. This gives teams the ability to process hundreds of thousands of messages per second, in a reliable no-ops manner. And to power its ETL workloads, Spotify is deploying Cloud Dataflow, Google’s data processing service. This lets the company rely on a single cloud-based managed service for both batch and stream processing.



What makes us most excited to work with Spotify is their company-wide focus on forward-looking user experiences. Now that they’ve begun using Google Cloud Platform, we can’t wait to see what Spotify builds next.



Join us for the GCP NEXT 2016 opening keynote, where we’ll feature a talk from Nicholas Harteau, VP of Engineering and Infrastructure at Spotify. You can also attend Spotify-led technical sessions where you can learn more about how they’re deploying Google Cloud BigQuery and Dataflow.



- Posted by Guillaume Leygues, Lead Sales Engineer, Google Cloud Platform

Today, during my keynote at the 2016 USENIX conference on File and Storage Technologies (FAST 2016), I’ll be talking about our goal to work with industry and academia to develop new lines of disks that are a better fit for data centers supporting cloud-based storage services. We're also releasing a ...
Today, during my keynote at the 2016 USENIX conference on File and Storage Technologies (FAST 2016), I’ll be talking about our goal to work with industry and academia to develop new lines of disks that are a better fit for data centers supporting cloud-based storage services. We're also releasing a white paper on the evolution of disk drives that we hope will help continue the decades of remarkable innovation achieved by the industry to date.



But why now? It's a fun but apocryphal story that the width of Roman chariots drove the spacing of modern train tracks. However, it is true that the modern disk drive owes its dimensions to the 3½” floppy disk used in PCs. It's very unlikely that's the optimal design, and now that we're firmly in the era of cloud-based storage, it's time to reevaluate broadly the design of modern disk drives.



The rise of cloud-based storage means that most (spinning) hard disks will be deployed primarily as part of large storage services housed in data centers. Such services are already the fastest growing market for disks and will be the majority market in the near future. For example, for YouTube alone, users upload over 400 hours of video every minute, which at one gigabyte per hour requires more than one petabyte (1M GB) of new storage every day or about 100x the Library of Congress. As shown in the graph, this continues to grow exponentially, with a 10x increase every five years.







At the heart of the paper is the idea that we need to optimize the collection of disks, rather than a single disk in a server. This shift has a range of interesting consequences including the counter-intuitive goal of having disks that are actually a little more likely to lose data, as we already have to have that data somewhere else anyway. It’s not that we want the disk to lose data, but rather that we can better focus the cost and effort spent trying to avoid data loss for other gains such as capacity or system performance.



We explore physical changes, such as taller drives and grouping of disks, as well as a range of shorter-term firmware-only changes. Our goals include higher capacity and more I/O operations per second, in addition to a better overall total cost of ownership. We hope this is the beginning of both a new chapter for disks and a broad and healthy discussion, including vendors, academia and other customers, about what “data center” disks should be in the era of cloud.



- Posted by Eric Brewer, VP Infrastructure, Google

Today Google Cloud Dataproc, our managed Apache Hadoop and Apache Spark service, says goodbye to its beta label and is now generally available.



When analyzing data, your attention should be focused on insights, not your tools. Often, popular tools to process data, such as Apache Hadoop and Apache Spark, require a careful balancing act between cost, complexity, scale, and utilization. Unfortunately, this means you focus less on what is important ...
Today Google Cloud Dataproc, our managed Apache Hadoop and Apache Spark service, says goodbye to its beta label and is now generally available.



When analyzing data, your attention should be focused on insights, not your tools. Often, popular tools to process data, such as Apache Hadoop and Apache Spark, require a careful balancing act between cost, complexity, scale, and utilization. Unfortunately, this means you focus less on what is important  your data  and more on what should require little or no attention  the cluster processing it.



We created our managed Spark and Hadoop cloud service, Google Cloud Dataproc, to rectify the balance, so that using these powerful data tools is as easy as 1-2-3.



Since Cloud Dataproc entered beta last year, customers have taken advantage of its speed, scalability, and simplicity. We’ve seen them create clusters from three to thousands of virtual CPUs, using our Developers Console and Google Cloud SDK, without wasting time waiting for their cluster to be ready.



With integrations to Google BigQuery, Google Cloud Bigtable, and Google Cloud Storage, which provide reliable storage independent from Dataproc clusters, customers have created clusters only when they need them, saving time and money, without losing data. Cloud Dataproc can also be used in conjunction with Google Cloud Dataflow for real-time batch and stream processing.



While in beta, Cloud Dataproc added several important features including property tuning, VM metadata and tagging, and cluster versioning. In general availability, just like in beta, new versions of Cloud Dataproc, with new features, functionalities and software components, will be frequently released. One example is support for custom machine types, available today.




Cloud Dataproc tips the scale of running Spark and Hadoop in your favor by lowering cost and complexity while increasing scalability and productivity











Cloud Dataproc minimizes two common and major distractions in data processing  cost and complexity by providing:




  • Low-cost. We believe two things  using Spark and Hadoop should not break the bank and that you should pay for what you actually use. As a result, Cloud Dataproc is priced at only 1 cent per virtual CPU in your cluster per hour, on top of the other Cloud Platform resources you use. Moreover, with per-minute billing and a low 10-minute minimum, you pay for what you actually use, not a rounded (up) approximation.



  • Speed. With Cloud Dataproc, clusters do not take 10, 15, or more minutes to start or stop. On average, Cloud Dataproc start and stop operations take 90 seconds or less. This can be a 2-10x improvement over other on-premises and IaaS solutions. As a result, you spend less time waiting on clusters and more time hands-on with data.



  • Management. Cloud Dataproc clusters don't require specialized administrators or software products. Cloud Dataproc clusters are built on proven Cloud Platform services, such as Google Compute Engine, Google Coud Networking, and Google Cloud Logging to increase availability while eliminating the need for complicated hands-on cluster administration. Moreover, Cloud Dataproc supports cluster versioning, giving you access to modern, tested, and stable versions of Spark and Hadoop.




Cloud Dataproc makes two often problematic needs in data processing easy  scale and productivity by being:





  • Modern. Cloud Dataproc is frequently updated with new image versions to support new software releases from the Spark and Hadoop ecosystem. This provides access to the latest stable releases while also ensuring backward compatibility. For general availability we're releasing image version 1.0.0 with support for Hadoop 2.7.2, Spark 1.6.0, Hive 1.2.1, and Pig 0.15.0. Support for other components, such as Apache Zeppelin (incubating) are provided in our GitHub repository for initialization actions.



  • Integrated. Cloud Dataproc has built-in integrations with other Cloud Platform services, such as BigQuery, Cloud Storage, Cloud Bigtable, and Google Cloud Logging so you have more than just a Spark or Hadoop cluster — you have a complete data platform. You can also use Cloud Dataproc initialization actions to extend the functionality of your clusters.




Our growing partner ecosystem offers certified support from several third-party tools and service partners. We're excited to collaborate with technology partners including Arimo, Attunity, Looker, WANdisco, and Zoomdata to make working in Cloud Dataproc even easier. Service providers like Moser, Pythian, and Tectonic are on standby to provide expert support during your Cloud Dataproc implementations. Reach out to any of our partners if you need help getting up and running.



To learn more about Cloud Dataproc, visit the Cloud Dataproc site, follow our getting started guide, take a look at a code example of how you can predict keno outcomes with Cloud Dataproc, or submit your questions and feedback on Stack Overflow.



- Posted by James Malone, Product Manager

Athletic gear, much like all apparel categories, is quickly shifting to an online sales business. Sports Authority, seeing the benefits that cloud could offer around agility and speed, turned to ...
Athletic gear, much like all apparel categories, is quickly shifting to an online sales business. Sports Authority, seeing the benefits that cloud could offer around agility and speed, turned to Google Cloud Platform to help it respond to its customers faster.



In 2014, Sports Authority’s technical team was asked to build a solution that would expose all in-store product inventory to its ecommerce site, sportsauthority.com, allowing customers to see local store availability of products as they were shopping online. That’s nearly half a million products to choose from in over 460 stores across the U.S. and Puerto Rico.



This use case posed a major challenge for the company. Its in-store inventory data was “locked” deep inside a mainframe. Exposing millions of products to thousands of customers, 24 hours a day, seven days a week would not be possible using this system.



The requirements for a new solution included finding the customer’s location, searching the 90 million record inventory system and returning product availability in just the handful of stores nearest in location to that particular customer. On top of that, the API would need to serve at least 50 customers per second, while returning results in less than 200 milliseconds.




Choosing the right cloud provider


At the time this project began, Sports Authority had already been a Google Apps for Work (Gmail, Google Sites, Docs) customer since 2011. However, it had never built any custom applications on Google Cloud Platform.



After a period of due diligence checking out competing cloud provider options, Sports Authority decided that Google App Engine and Google Cloud Datastore had the right combination of attributes — elastic scaling, resiliency and simplicity of deployment — to support this new solution.



Through the combined efforts of a dedicated project team, business partners and three or four talented developers, it was able to build a comprehensive solution on Cloud Platform in about five months. It consisted of multiple modules: 1) batch processes, using Informatica to push millions of product changes from its IBM mainframe to Google Cloud Storage each night, 2) load processes — python code running on App Engine, which spawn task queue jobs to load Cloud Datastore, and 3) a series of SOAP and REST APIs to expose the search functionality to its ecommerce website.



Sports Authority used tools including SOAPUI and LOADUI to simulate thousands of virtual users to measure the scalability of SOAP and REST APIs. It found that as the number of transactions grew past 2,000 per second, App Engine and Cloud Datastore continued to scale seamlessly, easily meeting its target response times.



The company implemented the inventory locator solution just in time for the 2014 holiday season. It performed admirably during that peak selling period and continues to do so today.




This screenshot shows what customers see when they shop for products on the website — a list of local stores, showing the availability of any given product in each store







When a customer finds a product she's interested in buying, the website requests inventory availability from Sports Authority’s cloud API, which provides a list of stores and product availability to the customer, as exhibited in the running shoe example above.




In-store kiosk


As Sports Authority became comfortable building solutions on Cloud Platform, it opened its eyes to other possibilities for creating new solutions to better serve its customers.



For example, it recently developed an in-store kiosk, which allows customers to search for products that may not be available in that particular store. It also lets them enroll in the loyalty program and purchase gift cards. This kiosk is implemented on a Google Chromebox, connected to a web application running on App Engine.




This image shows the in-store kiosk that customers use to locate products available in other stores. 












Internal store portal


Additionally, it built a store portal and task management system, which facilitates communication between the corporate office and its stores. This helps the store team members plan and execute their work more efficiently, allowing them to serve customers better when needs arise. This solution utilizes App Engine, Cloud Datastore and Google Custom Search, and was built with the help of a local Google partner, Tempus Nova.




This screenshot shows the internal store portal that employees use to monitor daily tasks.









Learning how to build software in any new environment such as Cloud Platform takes time, dedication and a willingness to learn. Once up to speed, the productivity and power of Google Cloud Platform allowed the Sports Authority team to work like a software company and build quickly while wielding great power.



- Posted by Jon Byrum, Product Marketing Manager, Google Cloud Platform

Today, we're announcing the beta release of Google Cloud Vision API. Now anyone can submit their images to the Cloud Vision API to understand the contents of those images ...
Today, we're announcing the beta release of Google Cloud Vision API. Now anyone can submit their images to the Cloud Vision API to understand the contents of those images  from detecting everyday objects (for example, “sports car,” “sushi,” or “eagle”) to reading text within the image or identifying product logos.



With the beta release of Cloud Vision API, you can access the API with location of images stored in Google Cloud Storage, along with existing support of embedding an image as part of the API request. We’re also announcing pricing for Cloud Vision API and added additional capabilities to identify the dominant color of an image. For example, you can now apply Label Detection on an image for as little as $2 per 1,000 images or Optical Character Recognition (OCR) for $0.60 for 1,000 images. Pricing will be effective, starting March 1st.







Cloud Vision API supports a broad set of scenarios from:




  • Insights from your images: Powered by the same technologies behind Google Photos, Cloud Vision API detects broad sets of objects in your images  from flowers to popular landmarks

  • Inappropriate content detection: Powered by Google SafeSearch, Cloud Vision API moderates content from your crowd sourced images by detecting different types of inappropriate content.

  • Image sentiment analysis: Cloud Vision API can analyze emotional attributes of people in your images, like joy, sorrow and anger, along with detecting popular product logos.

  • Text extraction: Optical Character Recognition (OCR) enables you to detect text within your images, along with automatic language identification across a broad set of languages.







Since we announced the limited preview of Google Cloud Vision API in early December, thousands of companies have used the API, generating millions of requests for image annotations. We're grateful for your feedback and comments and have been amazed by the breadth of applications using Cloud Vision API.



PhotoFy, a social photo editing and branding app, moderates over 150,000 photos a day created by a wide audience. Before the Cloud Vision API was available, CTO Chris Keenan said that protecting these branded photos from abuse was almost impossible. With the Cloud Vision API, PhotoFy can flag potentially violent and adult content on user created photos in line with their abuse policies



Marsal Gavaldà, Director of Engineering for machine intelligence over at Yik Yak, a location-based social network, ran over a million images through the Cloud Vision API. The company was impressed with the accuracy of its feature detectors and content analyzers and the precision and recall of the text extraction in multiple languages. The number of objects that can be identified with the Cloud Vision API is an order of magnitude greater than comparable services from other cloud providers.



During the beta timeframe, each user will have a quota of 20 million images/month. As such, Cloud Vision API is not intended for real-time mission critical applications. You can access the documentation, with samples and tutorials showing usage of the API in Python and Java1, along with mobile app samples for Android and iOS.



Google Cloud Vision API is our first step on the journey to enable applications to see, hear and make information in the world more useful. We welcome customers to join us on the journey and start using the API today. You can reach us with questions or feedback here.




We couldn't resist showing you our favorite robot again.






- Posted by Ram Ramanathan, Product Manager, Google Cloud Platform






1 Java is the registered trademark of Oracle and/or its affiliates.

Google Cloud Platform brings you lots of ways to store, access and archive your data, including Google Cloud Storage, DataStore and BigQuery. In some cases, there's a need to be able to access a POSIX compatible shared file system across a fleet of your cloud instances. To support these use cases with a robust scale out and scale up solution, Google Cloud Platform and Red Hat are proud to announce the availability of Red Hat Gluster Storage on Google Compute Engine.
Google Cloud Platform brings you lots of ways to store, access and archive your data, including Google Cloud Storage, DataStore and BigQuery. In some cases, there's a need to be able to access a POSIX compatible shared file system across a fleet of your cloud instances. To support these use cases with a robust scale out and scale up solution, Google Cloud Platform and Red Hat are proud to announce the availability of Red Hat Gluster Storage on Google Compute Engine.



Red Hat Gluster Storage offers a highly available and fault tolerant shared file system that can scale vertically and horizontally. Red Hat Gluster Storage makes use of compute instances with disks attached in order to provide a distributed, scale-out file system. Users of Red Hat Gluster Storage on GCE can take advantage of the performance, scalability and flexibility of our Persistent Disks.



Disks used for your Red Hat Gluster Storage installation can be chosen based on various performance and cost tradeoffs. For example, you can choose to use standard Persistent Disks for data that does not require high I/O throughput or use the more performant SSD Persistent Disks for your IOPS hungry workloads, such as media rendering. Each node in your cluster can leverage disks of up to 64TB in size and with up to 15,000 IOPS.



In order to protect mission critical data, Red Hat Gluster Storage enables users to synchronously replicate their files across multiple zones in the same region while at the same time asynchronously replicating them to a separate region for disaster recovery. In the example architecture below, we're using us-east1-b as our primary zone with a hot standby in us-east1-c:





For more information on getting started with Red Hat Gluster Storage on GCE click here.





- Posted by Vic Iglesias, Cloud Solutions Architect

Google Cloud Debugger, which lets you inspect the state of an application at any code location without stopping or slowing it down, now has an enhanced UI, expanded language support and debugging from more places.
Google Cloud Debugger, which lets you inspect the state of an application at any code location without stopping or slowing it down, now has an enhanced UI, expanded language support and debugging from more places.



You can view the application state without adding logging statements and we’ve made several important improvements to Cloud Debugger that’ll make production debugging more accessible, more intuitive, and more fun.




1. Language and runtimes and platforms


Cloud Debugger can help you quickly find and fix bugs in production applications. We started with support for Java 1 applications running in App Engine Managed VMs and have rapidly expanded support for more languages across Google Cloud Platform. With this release, Cloud Debugger is now available for the following languages and platforms:







2. UI enhanced for debugging


Cloud Debugger can be turned on without getting in your way. For example, debugger agents capture runtime information in a few milliseconds without user-perceptible delay to incoming requests. In production, when time is of the essence, Cloud Debugger is there when you need it and invisible when you don’t.



Additionally, Cloud Debugger is intuitive and easy to use. If you're familiar with setting breakpoints and inspecting applications when using local debuggers you'll be able to quickly transition to debugging in the cloud using a familiar UI for taking snapshots, setting conditions and specifying watch expressions.



With this release, we've completely overhauled the Cloud Debugger section of the Cloud Console to make it easier to get started and simpler to navigate. For example, now you can quickly complete each of the following actions using the new debugger web UI:




  1. Take snapshots. Cloud Debugger is integrated into common workflows such as deployment.

  2. Setup and select the source code that matches the deployed application by choosing among a variety of source code repositories. Both local and cloud repositories are now supported. You can use it without source code as well.

  3. Traverse a complex source hierarchy using the familiar treeview layout.

  4. Share snapshots and collaborate with other project members, as easily as sharing a URL.















This is just the beginning of the UI enhancements to make Cloud Debugger easier to use and to make diagnosing production errors more productive.




3. Debug using your source code, or none at all


You can inspect your application state and link it back to source code regardless of where your source code is stored or how you access it.


  1. Debug with no access to source at all



    We recognize that in many cases, developers may not be able to provide access to their source code. Cloud Debugger now lets you enter just the filename and line number to take a snapshot at that location.

  2. Debug with a source capture



    Upload a capture of your source code to help debug your application over multiple sessions without having to connect to a source repository.

  3. Debug with a local source



    You can simply point Cloud Debugger to any local source file to take a snapshot. When debugging with local files, the source code is used for that debug session only. No source code is uploaded to Google servers.

  4. Debug with a cloud source repository



    Like before, developers can use Cloud Debugger by providing access to the source code for their application using the source code storage and management features provided by Cloud Source Repositories. A source repository provides version control via git and can be managed using the Cloud Console and the new gcloud command-line tool. When a source control system is available, displaying accurate source information is simply a matter of pointing to the correct version of the source code in the source control repository using the developer console.







4. Debug on your terms in your tools


Developers working with IntelliJ IDEA can debug live production applications without leaving the IDE using the familiar IDEA debugger interface.



In case you’re unable to share your Java source code with us, that’s no problem, as the Cloud Tools for IntelliJ plugin uses the code from your local machine during your Cloud Debugging sessions. 



Stay tuned for more Cloud Debugger improvements over the coming weeks. As always, we love direct feedback and will be monitoring Stack Overflow for issues and suggestions. If you haven’t tried Cloud Debugger to diagnose problems in your production applications, now is a perfect time to start!



  - Posted by Sharat Shroff, Product Manager, Google Cloud Platform





1 Java is the registered trademark of Oracle and/or its affiliates.

Today we’re announcing the general availability of Custom Machine Types for Google Compute Engine, which let you create virtual machines with vCPU and memory configurations that are perfect for your workloads.
Today we’re announcing the general availability of Custom Machine Types for Google Compute Engine, which let you create virtual machines with vCPU and memory configurations that are perfect for your workloads.



Since our Beta launch, we've seen customers create virtual machines with novel vCPU and memory ratios that aren’t available from any of the major cloud providers. As a result, our customers have saved an average of 19% — and as much as 50% — on top of our already market-leading prices.




  • Wix has seen an 18% savings in compute to power their media platform that now serves over 75 million users.

  • Lytics is saving 20% to 50% by accurately matching resource need to each compute type they used to unlock behavior-rich insights with their Customer Data Platform.

  • iRewind is seeing up to 20% saved in processing cost to power their pipeline that produced more than 500,000 movies just last year.




Custom Machine Types extend Google Compute Engine’s tradition of making IaaS truly flexible and ensuring you only pay for the resources you use. Per-minute billing broke you free from the imposed hourly charges. Sustained Use Discounts made sure you get automatic discounts based on usage without upfront commitments or prepayments. Now, Custom Machine Types give you the option to configure your VMs to achieve the best price performance for your specific workload.



You can create virtual machines with as few as 1 vCPU and as many as 32 vCPUs with up to 6.5 GiB of memory per vCPU. You can use Custom Machine Types with CentOS, CoreOS, Debian, OpenSUSE, Ubuntu, and now with RedHat, and Windows operating systems. Or bring your own Linux variant to further customize your setup. Google Container Engine and Deployment Manager now also support Custom Machine Types.



Custom Machine Types have flat pricing based on per vCPUs and per GiB of memory usage. A 4 vCPU, 10 GiB of memory VM, for example, costs half as much as an 8 vCPU 20 GiB memory VM. You also get our standard customer-friendly pricing features like per-minute billing and sustained use discounts.



Give Custom Machine Types a try today and see how much you could save! Visit the Compute Engine section of Google Cloud Platform Console and click Create Instance. In the Create instance page, you'll notice Machine Type now has a Basic and Customize view. Click Customize and build a virtual machine to fit your needs.









Custom Machine Types are supported by the gcloud command-line tool and through our API. Creating a VM is as easy as:



$ gcloud components update

$ gcloud compute instances create my-custom-vm --custom-cpu 6

--custom-memory 12 --zone us-central1-f




For more info on Custom Machine Types, visit our website.



- Posted by Sami Iqram, Product Manager, Google Cloud Platform

Today’s guest blog comes from Kalev Leetaru, founder of The GDELT Project, which monitors the

world’s news media in nearly every country in over 100 languages to identify the events and narratives driving our global society. ...
Today’s guest blog comes from Kalev Leetaru, founder of The GDELT Project, which monitors the

world’s news media in nearly every country in over 100 languages to identify the events and narratives driving our global society.



This past September I published into Google BigQuery a massive new public dataset of metadata from 3.5 million digitized English-language books dating back more than two centuries (1800-2015), along with the full text of 1 million of these books. The archive, which draws from the English-language public domain book collections of the Internet Archive and HathiTrust, includes full publication details for every book, along with a wide array of computed content-based data. The entire archive is available as two public BigQuery datasets, and there’s a growing collection of sample queries to help users get started with the collection. You can even map two centuries of books with a single line of SQL.



What did it look like to process 3.5 million books? Data-mining and creating a public archive of 3.5 million books is an example of an application perfectly suited to the cloud, in which a large amount of specialized processing power is needed for only a brief period of time. Here are the five main steps that I took to make the invaluable learnings of millions of books more easily and speedily accessible in the cloud:


  1. The project began with a single 8-core Google Compute Engine (GCE) instance with a 2TB SSD persistent disk that was used to download the 3.5 million books. I downloaded the books to the instance’s local disk, unzipped them, converted them into a standardized file format, and then uploaded them to Google Cloud Storage (GCS) in large batches, using the composite objects and parallel upload capability of GCS. Unlike traditional UNIX file systems, GCS performance does not degrade with large numbers of small files in a single directory, so I could upload all 3.5 million files into a common set of directories.


    Figure 1: Visualization of two centuries of books




  2. Once all books had been downloaded and stored into GCS, I launched ten 16-core High Mem (100GB RAM) GCE instances (160 cores total) to process the books, each with a 50GB persistent SSD root disk to achieve faster IO over traditional persistent disks. To launch all ten instances quickly, I launched the first instance and configured that with all of the necessary software libraries and tools, then created and used a disk snapshot to rapidly clone the other nine with just a few clicks. Each of the ten compute instances would download a batch of 100 books at a time to process from GCS.

  3. Once the books had been processed, I uploaded back into GCS all of the computed metadata. In this way, GCS served as a central storage fabric connecting the compute nodes. Remarkably, even in worst-case scenarios when all 160 processors were either downloading new batches of books from GCS or uploading output files back to GCS in parallel, there was no measurable performance degradation.

  4. With the books processed, I deleted the ten compute instances and launched a single 32-core instance with 200GB of RAM, a 10TB persistent SSD disk, and four 375GB direct-attached Local SSD Disks. I used this to reassemble the 3.5 million per-book output files into single output files, tab-delimited with data available for each year, merging in publication metadata and other information about each book. Disk IO of more than 750MB/s was observed on this machine.

  5. I then uploaded the final per-year output files to a public GCS directory with web downloading enabled, allowing the public to download the files.


Since very few researchers have the bandwidth, local storage or computing power to process even just the metadata of 3.5 million books, the entire collection was uploaded into Google BigQuery as a public dataset. Using standard SQL queries, you can explore the entire collection in tens of seconds at speeds of up to 45.5GB/s and perform complex analyses entirely in-database.



The entire project, from start to finish, took less than two weeks, a good portion of which consisted of human verification for issues with the publication metadata. This is significant because previous attempts to process even a subset of the collection on a modern HPC supercluster had taken over one month and completed only a fraction of the number of books examined here. The limiting factor was always the movement of data: transferring terabytes of books and their computed metadata across hundreds of processors.



This is where Google’s cloud offerings shine, seemingly purpose-built for data-first computing. In just two weeks, I was able to process 3.5 million books, spinning up a cluster of 160 cores and 1TB of RAM, followed by a single machine with 32 cores, 200GB of RAM, 10TB of SSD disk and 1TB of direct-attached scratch SSD disk. I was able to make the final results publicly accessible through BigQuery at query speeds of over 45.5GB/s.



You can access the entire collection today in BigQuery, explore sample queries, and read more technical detail about the processing pipeline on the GDELT Blog.



I’d like to thank Google, Clemson University, the Internet Archive, HathiTrust, and OCLC in making this project possible, along with all of the contributing libraries and digitization sponsors that have made these digitized books available.



- Posted by Kalev Leetaru, founder of The GDELT Project

At some point in development, nearly every mobile app needs a backend service. With Google’s services you can rapidly build backend services that:




  • Scale automatically to meet demand

  • Automatically synchronize data across devices

  • Handle the offline case gracefully

  • Send notifications and messages

At some point in development, nearly every mobile app needs a backend service. With Google’s services you can rapidly build backend services that:




  • Scale automatically to meet demand

  • Automatically synchronize data across devices

  • Handle the offline case gracefully

  • Send notifications and messages




The following are design patterns you’ll find in Build mobile apps using Google Cloud Platform, which provides a side-by-side comparison of Google services, as well as links to tutorials and sample code. Click on a diagram for more information and links to sample code.




Real-time data synchronization with Firebase


Firebase is a fully managed platform for building iOS, Android and web apps that provides automatic data synchronization and authentication services.



To understand how using Firebase can simplify app development, consider a chat app. By storing the data in Firebase, you get the benefits of automatic synchronization of data across devices, minimal on-device storage, and an authentication service. All without having to write a backend service.






Add managed computation to Firebase apps with Google App Engine


If your app needs backend computation to process user data or orchestrate events, extending Firebase with App Engine gives you the benefit of automatic real-time data synchronization and an application platform that monitors, updates and scales the hosting environment.



An example of how you can use Firebase with App Engine is an app that implements a to-do list. Using Firebase to store the data ensures that the list is updated across devices. Connecting to your Firebase data from a backend service running on App Engine gives you the ability to process or act on that data. In the case of the to-do app, to send daily reminder emails.








Add flexible computation to Firebase with App Engine Managed VMs


If your mobile backend service needs to call native binaries, write to the file systems and make other system calls, extending Firebase with App Engine Managed VMs gives you the benefit of automatic real-time data synchronization and an application platform, with the flexibility to run code outside of the standard App Engine runtime.



Using Firebase and App Engine Managed VMs is similar to using Firebase with App Engine and adds additional options. For example, consider an app that converts chat messages into haikus using a pre-existing native binary. You can use Firebase to store and synchronize the data and connect to that data from a backend service running on App Engine Managed VMs. Your backend service can then detect new messages, call the native binaries to translate them into poetry, and push the new versions back to Firebase.






Automatically generate client libraries with App Engine and Google Cloud Endpoints


Using Cloud Endpoints means you don’t have to write wrappers to handle communication with App Engine. With the client libraries generated by Cloud Endpoints, you can simply make direct API calls from your mobile app.



If you're building an app that does not require real-time data synchronization, or if messaging and synchronization are already part of your backend service, using App Engine with Cloud Endpoints speeds development time by automatically generating client libraries. An example of an app where real-time synchronization is not needed is one that looks up information about retail products and finds nearby store locations.




Have full control with Compute Engine and REST or gRPC


With Google Compute Engine, you create and run virtual machines on Google infrastructure. You have administrator rights to the server and full control over its configuration.



If you have an existing backend service running on a physical or virtual machine, and that service requires a custom server configuration, moving your service to Compute Engine is the fastest way to get your code running on Cloud Platform. Keep in mind that you will be responsible for maintaining and updating your virtual machine.



An example of an app you might run on Compute Engine is an app with a backend service that uses third-party libraries and a custom server configuration.





For more information about these designs, as well as information about building your service, testing and monitoring your service and connecting to your service from your mobile app — including sending push notifications — see How to build backend services for mobile apps.



- Posted by Syne Mitchell, Technical Writer, Google Cloud Platform


The JGroups messaging toolkit is a popular solution for clustering Java-based application servers in a reliable manner. This post describes how to store, host and manage your JGroups cluster member data using ...
The JGroups messaging toolkit is a popular solution for clustering Java-based application servers in a reliable manner. This post describes how to store, host and manage your JGroups cluster member data using Google Cloud Storage. The configuration provided here is particularly well-suited for the discovery of Google Compute Engine nodes; however, for testing purposes, it can also be used with your current on-premises virtual machines.




Overview of JGroups clustering on Cloud Storage




JGroups versions 3.5 and later enable the discovery of clustered members, or nodes, on GCP via a JGroups protocol called GOOGLE_PING.GOOGLE_PING stores information about each member in flat files in a Cloud Storage bucket, and then uses these files to discover initial members in a cluster. When new members are added, they read the addresses of the other cluster members from the Cloud Storage bucket, and then ping each member to announce themselves.



By default, JGroups members use multicast communication over UDP to broadcast their presence to other instances on a network. Google Cloud Platform, like most cloud providers and enterprise networks, does not support multicast; however, both GCP and JGroups support unicast communication over TCP as a viable fallback. In the unicast-over-TCP model, a new instance instead announces its arrival by iterating over the list of nodes already joined to a cluster, individually notifying each node.




Configure Cloud Storage to store JGroups configuration files




To allow JGroups to use Cloud Storage for file storage, begin by creating a Cloud Storage bucket:


  1. In the Cloud Platform Console, go to the Cloud Storage browser.

  2. Click Create bucket.

  3. In the Create bucket dialog, specify the following:


    • A bucket name, subject to the bucket name requirements

    • The Standard storage class

    • A location where bucket data will be stored



Next, set up interoperability and create a new Cloud Storage developer key. You'll need the developer key for authentication: GOOGLE_PING sends an authenticated request via the Cloud Storage XML API, which uses keyed-hash message authentication code (HMAC) authentication with Cloud Storage developer keys. To generate a developer key:


  1. Open the Storage settings page in the Google Cloud Platform Console.

  2. Select the Interoperability tab.

  3. If you have not set up interoperability before, click Enable interoperability access.                               Note: Interoperability access allows Cloud Storage to interoperate with tools written for other cloud storage systems. Because GOOGLE_PING is based on the Amazon-oriented S3_PING class in JGroups, it requires interoperability access.

  4. Click Create a new key.

  5. Make note of the Access key and Secret values—you'll need them later.


Important: Keep your developer keys secret. Your developer keys are linked to your Google account, and you should treat them as you would treat any set of access credentials.




Configure your clustered application to use GOOGLE_PING


Now that you've created your Cloud Storage bucket and developer keys, configure your application's JGroups configuration to use the GOOGLE_PING class. For most applications that use JGroups, you can do so as follows:


  1. Edit your JGroups XML configuration file (jgroups.xml in most cases).

  2. Modify the file to use TCP instead of UDP:                                                                                                <tcp bind_port="7800"> 

  3. Locate the PING section and replace it with GOOGLE_PING, as shown in the following example. Replace your-jgroups-bucket with the name of your Cloud Storage bucket, and replace your-access-key and your-secret with the values of your access key and secret:


<!-- PING timeout="2000" num_initial_members="3"/ -->

<GOOGLE_PING
location="your-jgroups-bucket"
access_key="your-access-key"
secret_access_key="your-secret"
timeout="2000" num_initial_members="3"/>



Now GOOGLE_PING will use your Cloud Storage bucket and automatically create a folder that's named to match the cluster name.



Warning: By default, your virtual machines will communicate with your bucket insecurely through port 80. To set up an encrypted connection between the instances and the bucket, add the following attribute to the GOOGLE_PING element:



      <google_ping ...="" port="443">



If you use JBoss Wildfly application server, you can configure clustering by configuring the JGroup subsystem and adding the the GOOGLE_PING protocol.




Demonstration


This section walks you through a concrete demonstration of GOOGLE_PING in action. This example sets up a cluster of Compute Engine instances that reside within the same Cloud Platform project, using their internal IPs as ping targets.



First, I start a sender application (using Vert.x) on a Compute Engine instance, making it the first member of my cluster:



$ java -Djava.net.preferIPv4Stack=true

-Djgroups.bind_addr=10.240.0.2 -jar

my-sender-fatjar-3.1.0-fat.jar -cluster -cluster-host 10.240.0.2




Note: In general, you should bind to your Compute Engine instances' internal IP addresses. If you would prefer to cluster your instances by using their externally routable IP addresses, add the following parameter to your java command, replacing (external_ip) with the external IP of the instance:



-Djgroups.external_addr=<external_ip>



When the application begins running, it displays "No reply," as no receiver nodes have been set up yet:







This sender node creates a folder and a .list file in my Cloud Storage bucket. My JGroups cluster is configured with the name JGROUPS_CLUSTER, so my Cloud Storage folder is also automatically named JGROUPS_CLUSTER:







The .list file lists all of the members in the JGROUPS_CLUSTER cluster. In JGroups, the first node to start is designated as the cluster coordinator; as such, the single node I've started has been marked with a T, meaning that the node's cluster-coordinator status is true.







Next, I start a receiver application, also using Vert.x, on a second Compute Engine instance:



$ java -Djava.net.preferIPv4Stack=true

-Djgroups.bind_addr=10.240.0.2 -jar

my-receiver-fatjar-3.1.0-fat.jar -cluster -cluster-host

10.240.0.2




This action adds an entry to the .list file for the new member node:





Once the node has been added to the .list file, the node begins receiving "ping!" messages from the first member node:



The second node responds to each "ping!" message with a "pong!" message. When the first node receives a "pong!" message, it displays "Received reply pong!" in the sender application's standard output:












Get started


You can give GOOGLE_PING a try for free by signing up for a free trial.



- Posted by Grace Mollison, Solutions Architect

In the time it took you to click on this post and start reading, Google Cloud Platform processed millions of big data analytics events and we’ll process billions more later today. We’re fans of distributed systems and large-scale data processing and we know many of you are too.
In the time it took you to click on this post and start reading, Google Cloud Platform processed millions of big data analytics events and we’ll process billions more later today. We’re fans of distributed systems and large-scale data processing and we know many of you are too.



In almost every survey we’ve done you have told us you want to hear more about new features as well as what’s under the hood of our cloud services, in detail and in an ongoing way.



Today we’re taking a step in that direction with our first topic-focused blog. We’re starting with big data because we have a lot to share on this subject that we haven’t revealed yet and we know there’s tremendous interest in these technologies.



If debating the merits of the Spark and Dataflow programming models into the wee small hours of the morning is something you could easily find yourself doing; or you get excited at the prospect of processing terabytes in seconds with zero setup for a few bucks, or simply want to learn how to use the infrastructure that powers Google for your data processing work, this blog is for you.



The team contributing to it are engineers, developer advocates, product managers, technical writers, technical program managers and support engineers at Google, and they are eager to share their excitement for these technologies with you. They also want to hear what you’re up to and what you need from us  reach out on Twitter @GCPBigData.



Look forward to sharing stories!



Posted by Jo Maitland, Managing Editor, Google Cloud Platform

I recently joined the Google Cloud Platform team, but I’ve never really explained why I was attracted to Google in the first place. Before joining Google I’d been a strong advocate of two key technologies: the ...
I recently joined the Google Cloud Platform team, but I’ve never really explained why I was attracted to Google in the first place. Before joining Google I’d been a strong advocate of two key technologies: the Go programing language and Kubernetes. Both just so happen to originate from Google and I’m sure my investment in both technologies helped me land a job here. Like many, I was attracted to Google because of all the inspiring innovations that have helped shape the last decade of computing and have influenced a countless number of open source projects.



I’ve spent several years pouring over Google white papers and stitching together information from across the web trying to stay up to speed, and I’ll tell you it’s pretty time consuming. This year I’ve got a better idea. I’ll be attending GCP NEXT 2016. Why? Because it’s the only conference where you can find complete coverage of Google Cloud Platform technologies and more importantly the people behind them.







Today we’re announcing the GCP NEXT conference program, featuring in-depth technical sessions led by Google and the Google Cloud Platform community— developers, customers and partners. Dive into compute with us for two-full days and come away with a practical expertise in Google Cloud Platform. Sample sessions include:




  • "From idea to market in less than 6 months: Creating a new product with GCP," presented by CI&T — App Developer Track

  • "Painless container management with Google Container Engine & Kubernetes," presented by Brendan Burns & Tim Hockin, Google — Infrastructure & Operations Track

  • "Cloud data warehousing with BigQuery featuring Dropbox Nighthawk," presented by Jordan Tigani, Google & Dropbox — Data & Analytics Track

  • "Security analytics for today's cloud-ready enterprise," presented by Matt O’Connor, Google & PwC — Solutions Showcase




Curated from our Call for Speakers, an internal and external search for the very best content, demos and presenters, NEXT technical tracks cover the most relevant topics in cloud, from machine learning to networking and IOT. They’ll also teach you best practices and how-tos directly from product leaders and developers who have implemented our platform, including speakers from Netflix, Atomic Fiction, FIS Global (Sungard), and many more to be announced.



If you want to know more about Google Cloud Platform, are thinking about moving to the cloud or want to sharpen your skills in compute, don’t miss GCP NEXT. Register today and get our early bird rate (available until February 5th).



To keep up to date on GCP NEXT 2016, follow us on Google+, Twitter, and LinkedIn.



- Posted by Kelsey Hightower, Developer Advocate, Google Cloud Platform