Google Cloud Platform Blog
Scaling with the Kindle Fire
Tuesday, November 29, 2011
Today’s blog post comes to us from Greg Bayer of
Pulse
, a popular news reading application for iPhone, iPad and Android devices. Pulse has used Google App Engine as a core part of their infrastructure for over a year and they recently celebrated a significant launch. We hope you find their experiences and tips on scaling useful.
As part of the much anticipated Kindle Fire launch,
Pulse
was
announced
as one of the only preloaded apps. When you first un-box the Fire, Pulse will be there waiting for you on the home row, next to Facebook and IMDB!
Scale
The Kindle Fire is projected to sell over
five million units
this quarter alone. This means that those of us who work on backend infrastructure at Pulse have had to prepare for nearly
doubling our user-base
in a very short period. We also need to be ready for spikes in load due to press events and the holiday season.
Architecture
As I’ve discussed previously on the
Pulse Engineering Blog
, Pulse’s infrastructure has been designed with scalability in mind from the beginning. We’ve built our web site and client APIs on top of Google App Engine, which has allowed us to grow steadily from 10s to many 1000s of requests per second, without needing to re-architect our systems.
While restrictive in some ways, we’ve found App Engine’s frontend serving instances (running Python in our case) to be extremely scalable, with minimal operational support from our team. We’ve also found the datastore, memcache, and task queue facilities to be equally scalable.
Pulse’s backend infrastructure provides many critical services to our native applications and web site. For example, we cache and serve optimized feed and image data for each source in our catalog. This allows us to minimize latency and data transfer and is especially important to providing an exceptional user experience on limited mobile connections. Providing this service for millions of users requires us to serve 100Ms of requests per day. As with any
well designed App Engine app
, the vast majority of these requests are served out of
memcache
and never hit the datastore. Another useful technique we use is to
set public cache control headers
wherever possible, to allow Google’s edge cache (shown as cached requests on the graph below) and ISP / mobile carrier caches to serve unchanged content directly to users.
Costs
Based on App Engine’s projected billing statements leading up to the recent
pricing changes
, we were concerned that our costs might increase significantly
. To prepare for these changes and the expected additional load from Kindle Fire users, we invested some time in diagnosing and reducing these costs. In most cases, the increases turned out to be an indicator of inefficiencies in our code and/or in the App Engine scheduler. With a little optimization, we have reduced these costs dramatically.
The new tuning sliders for the scheduler make it possible to rein in overly aggressive instance allocation. In the old pricing structure, idle instance time wasn’t charged for at all, so these inefficiencies were usually ignored. Now App Engine charges for all instance time by default. However, any time App Engine runs more idle instances than you’ve allowed, those hours are free. This acts as a hint to the scheduler, helping it reduce unneeded idle instances. By doing some testing to find the optimal cost vs spike latency tolerance and setting the sliders to those levels, we were able to reduce our frontend instance costs to near original levels. Our heavy usage of memcache (which is still free!) also helps keep our instance hours down.
Since datastore operations used to be charged under the umbrella of CPU hours, it was difficult to know the cost of these operations under the old pricing structure. This meant it was easy to miss application inefficiencies, especially for write-heavy workloads where additional indexes can have a
multiplicative effect
on costs. In our case, the new datastore write operations metric led us to notice some inefficiencies in our design and a tendency to overuse indexes. We are now working to minimize the number of indexes our queries rely on, and this has started to reduce our write costs.
Preparing for the Kindle Fire Launch
We took a few additional steps to prepare for the expected load increase and spikes associated with the Fire’s launch. First, we contacted App Engine’s support team to warn them of the expected increase. This is recommended for any app at or near 10,000 requests per second (to make sure your application is correctly provisioned). We also signed up for a
Premier
account which gets us additional support and simpler billing.
Architecturally, we decided to split our load across three primary applications, each serving different use cases. While this makes it harder to access data across these applications, those same boundaries serve to isolate potential load-related problems and make tuning simpler. In our case, we were able to divide certain parts of our infrastructure, where cross application data access was less important and load would be significant. Until App Engine provides more visibility into and control of memcache eviction policies, this approach also helps prevent lower priority data from evicting critical data.
I’m hopeful that in the near future such division of services will not be required. Individually tunable load isolation zones and memcache controls would certainly make it a lot more appealing to have everything in a single application. Until then, this technique works quite well, and helps to simplify how we think about scaling.
To learn more about Pulse, check out
our
website
! If you have comments or questions about this post or just want to reach out directly,
you can find me
@gregbayer
.
New Datastore client library for Python ready for a test drive
Wednesday, November 16, 2011
Last week we
announced
that App Engine has left preview and is now an officially supported product here at Google. And while the release (and the announcement) was chock-full of great features, one of the features that we’d like to call specific attention to is the new Datastore client library for Python (a.k.a “NDB”).
NDB
has been unde
r development for some time and this release marks its availability to a larger audience as an experimental feature. Some of the benefits of this new library include:
The StructuredProperty class, which allows entities to have nested structure
Integrated two-level caching, using both memcache and a per-request in-process cache
High-level asynchronous API using Python generators as coroutine
s (
PEP 342
)
New, cleaner implementations of Key, Model, Property and Query classes
The version of NDB contained in the 1.6.0 runtime and SDK corresponds to NDB 0.9.1, which is currently the latest NDB release.
Given that this feature is still experimental, it is subject to change, but that’s exactly why we encourage you to give it a test drive and send us any feedback that you might have.
The
NDB project
hosted
on Google
Code is the best place to send this feedback. Happy coding!
Posted by Guido van Rossum, Software Engineer on the App Engine Team
Google BigQuery Service: Big data analytics at Google speed
Monday, November 14, 2011
Our post today,
cross-posted with the
Google Enterprise Blog
,
comes from one of our sister projects, BigQuery. We know that many of you are interested in processing large volumes of data and we encourage you to try it out.
Rapidly crunching terabytes of big data can lead to better business decisions, but this has traditionally required tremendous IT investments. Imagine a large online retailer that wants to provide better product recommendations
by analyzing website usage and purchase patterns from millions of website visits. Or consider a car manufacturer that wants to maximize its advertising impact by learning how its last global campaign performed across billions of multimedia impressions. Fortune 500 companies struggle to unlock the potential of data, so it’s no surprise that it’s been even harder for smaller businesses.
We developed
Google BigQuery Service
for large-scale internal data analytics. At
Google I/O last year
, we opened a preview of the service to a limited number of enterprises and developers. Today we're releasing some big improvements, and putting one of Google's most powerful data analysis systems into the hands of more companies of all sizes.
We’ve added a graphical user interface for analysts and developers to rapidly explore massive data through a web application.
We’ve made big improvements for customers accessing the service programmatically through the API. The new REST API lets you run multiple jobs in the background and manage tables and permissions with more granularity.
Whether you use the BigQuery web application or API, you can now write even more powerful queries with JOIN statements. This lets you run queries across multiple data tables, linked by data that tables have in common.
It’s also now easy to manage, secure, and share access to your data tables in BigQuery, and export query results to the desktop or to
Google Cloud Storage
.
Michael J. Franklin, Professor of Computer Science at UC Berkeley,
remarked
that BigQuery (internally known as Dremel) leverages “thousands of machines to process data at a scale that is simply jaw-dropping given the current state of the art.” We’re looking forward to helping businesses innovate faster by harnessing their own large data sets. BigQuery is available free of charge for now, and we’ll let customers know at least 30 days before the free period ends. We’re bringing on a new batch of pilot customers, so
let us know
if your business wants to test-drive BigQuery Service.
Posted by Ju-Kay Kwek, Product Manager
App Engine 1.6.0 Out of Preview Release
Monday, November 7, 2011
Three and a half years after App Engine’s first
Campfire One
, App Engine has graduated from Preview and is now a fully supported Google product. We started out with the simple philosophy that App Engine should be ‘easy to use, easy to scale, and free to get started.’ And with 100 billion+ monthly hits, 300,000+ active apps, and 100,000+ developers using our product every month it’s clear that this philosophy resonates. Thanks to your support, Google is making a long term investment in App Engine!
When we announced our plans to leave preview earlier this year, we made a commitment to improving the service by adding support for
Python 2.7
,
Premier Accounts
and
Backends
as well as several changes launching today:
Pricing
: The new
pricing structure
announced in May (and updated based on feedback from the community) will be reflected in your bill starting on Nov 7th as
previously announced
.
Terms of Service
: We have a new business-friendly
terms of service
and
acceptable use policy
.
Service Level Agreement
: All paid applications on the High Replication Datastore are covered by our
99.95% SLA
.
We are also holding a series of App Engine Office hours via Google+ this week for any users who have questions about how these changes impact their applications. The list of times can be found on the
Google Developers events
page, with links to join the hangout while the office hours are scheduled. Also, please don’t hesitate to contact us at
appengine_updated_pricing@google.com
with any questions or concerns.
In addition to leaving Preview, we have several additional changes to announce today.
Production Changes
For billing enabled apps, we are offering two more scheduler controls and some additional changes:
Min Idle Instances
: You can now adjust the minimum number of Idle Instances for your application, from 1 to 100. Users who had previously signed up for “Always On” can now set the number of idle instances for their applications using this setting.
Max Pending Latency
: For applications that care about user facing latency, this slider allows you to set a limit to the amount of time a request spends in the pending queue before starting up a new instance.
Blobstore API
: You can now use the Blobstore API without signing up for billing.
Datastore Changes
High Replication Datastore Migration Tool
:
We are releasing an experimental tool that allows you to easily migrate your data from Master/Slave to High Replication Datastore, and seamlessly switch your application’s serving to the new HRD application.
Query Planning Improvements
:
We’ve
published an article
that details recent improvements to our query planner that eliminate the need for exploding indexes.
Python
MapReduce
:
We are releasing the full MapReduce framework in experimental for Python. The framework includes the Map, Shuffle, and Reduce phases.
Python 2.7 in the SDK:
The SDK now supports the Python 2.7 runtime, so you can test out your changes before uploading them to production.
Java
™
Memcache API Improvements
: The Memcache API for Java now supports asynchronous calls. Additionally, putIfUntouched() and getIdentifiable() now support batch operations.
Capability Testing
:
We’ve added the ability to simulate the capability state of local API implementations to test your application’s behavior if a service is unavailable.
Datastore Callbacks
:
You can now specify actions to perform before or after a put() or delete() call.
The full list of changes with this release can be found in the release notes (
Python
,
Java
). We’d love to hear your feedback about this release in the groups. And we’d like to thank you all for investing in our platform for the last three years. We’re excited for this milestone in App Engine history, and we look forward to what the future will bring.
Posted by The App Engine Team
Oracle and Java are registered trademarks of Oracle and/or its affiliates.
Don't Miss Next '17
Use promo code NEXT1720 to save $300 off general admission
REGISTER NOW
Free Trial
GCP Blogs
Big Data & Machine Learning
Kubernetes
GCP Japan Blog
Labels
Announcements
56
Big Data & Machine Learning
91
Compute
156
Containers & Kubernetes
36
CRE
7
Customers
90
Developer Tools & Insights
80
Events
34
Infrastructure
24
Management Tools
39
Networking
18
Open Source
105
Partners
63
Pricing
24
Security & Identity
23
Solutions
16
Stackdriver
19
Storage & Databases
111
Weekly Roundups
16
Archive
2017
Feb
Jan
2016
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Feed
Subscribe by email
Technical questions? Check us out on
Stack Overflow
.
Subscribe to
our monthly newsletter
.
Google
on
Follow @googlecloud
Follow
Follow