cloud-test: June 2009

Google Cloud Platform Blog

The new Task Queue API on Google App Engine

Thursday, June 18, 2009

With release 1.2.3 of the Python SDK, we are psyched to present an exciting new feature - the Task Queue API. You can now perform offline processing on App Engine by scheduling bundles of work (tasks) for automatic execution in the background. You don't need to worry about managing threads or polling - just write the task processing code, queue up some input data, and App Engine handles the rest. If desired, you can even organize and control task execution by defining custom queues. A quick example:


   # for each user, add a task to send a custom email message
   for u in users:
       taskqueue.add(url='/work/sendmail',
          params=dict(to=u.email, subject='Hello ' + u.name, body='this is a message!'))
  
   return # finished now, emails will be sent offline when tasks execute

   ...

   # task handler at /work/sendmail, automatically called for each task created above
   class MailWorker(webapp.RequestHandler):
      def post(self):
         mail.send_mail(
            'from_me@example.com',
            self.request.get('to'),
            self.request.get('subject'),
            self.request.get('body'))

We're eager to help you learn and experiment with Task Queues. The team recently presented the feature at Google I/O and the video is now available (slides are here). We've also prepared a set of demos to help you get started. And of course, don't miss the feature documentation. The Task Queue API is Python-only for now; we'll have a Java language version available soon.

Please note that the Task Queue API is currently a Labs release - we want to get your feedback on its usability and functionality before finalizing the API. You'll notice that its Python import path currently includes the 'labs' module (google.appengine.api.labs.taskqueue). Before the feature is promoted out of Labs, we may need to:

Change the quotas and limits which apply to Task execution (definitely, we hope to raise the number of Tasks you can use per day).

Change the API itself if there are usability or functionality issues.

Change how we bill for Task Queue usage.

Once we're ready to promote the feature out of Labs, we'll give weeks of notice and provide a transition path for our developers.

Last but not least, the 1.2.3 release is full of other new stuff as well! Stay tuned to the blog for more updates or check the release notes for exciting info on:

Asynchronous urlfetch support

Django 1.0 support

Visit the Downloads page to get SDK 1.2.3 now!

The Task Queue API is the first milestone of our plan to deliver rich support for offline processing. There's more to come, but we hope the simplicity and power of this first release opens a new range of possibilities for our developers. Try it out and let us know! We'll be watching the Group for your input.

Google

App Engine @ Google I/O goodness for all to enjoy

Thursday, June 11, 2009

Back in April when we launched Java support, we gave the first 10,000 developers who signed up access to the new runtime. In case you haven't heard, we recently announced at Google I/O that App Engine for Java signup is now open. We're excited to see more developers joining our community!

this post to the Google Code Blog

From Spark Plug to Drive Train: Life of an App Engine Request

Building Scalable, Complex Apps on App Engine

Offline Processing on App Engine: a Look Ahead

The Softer Side Of Schemas - Mapping Java Persistence Standards To the Google App Engine Datastore

Transactions Across Datacenters (and Other Weekend Projects)

App Engine Nitty-Gritty: Scalability, Fault Tolerance, and Integrating Amazon EC2

JRuby and Ioke on Google App Engine for Java

App Engine: Now Serving Java

A Design for a Distributed Transaction Layer for Google App Engine

Connecting The Clouds: Integrating Google App Engine for Java with Force.com

ThoughtWorks on App Engine for Java: An Enterprise Cumulonimbus?

Groovy and Grails in App Engine

Apart from the sessions, we also had a Developer Sandbox featuring App Engine partners and customers. We got a chance to interview and ask many of them to share their experience developing on App Engine. Check out these video interviews to get more insight to App Engine:

3Scale Networks

Best Buy

Caucho Technology

EZasset

Gigapan.org

LifeAware

LingoSpot

WalkScore

Last but not least, we've also made available the handy Python cheat sheet we gave out during Google I/O. You can download it here. We want to thank our partners and the App Engine developer community for a fantastic Google I/O 2009! It was great meeting those of you who were there. We look forward to the next year!

Posted by Amanda Surya, App Engine TeamJava is a trademark or registered trademark of Sun Microsystems, Inc. in the United States and other countries.

Google

Changing Quotas To Keep Most Apps Serving Free

Wednesday, June 10, 2009

Since App Engine launched, our goal has been to offer substantial free hosting resources to as many developers as possible. As previously announced, we are changing our free resource quota levels, effective on June 22nd. Our target level of free support has been 5 million page views per month for a reasonably efficient web application.When we launched App Engine, we were intentionally generous in our free quotas, both because we didn't know resources usage of a typical request, and because we didn't offer a billing feature to allow developers to buy more resources for a higher-traffic app. Since our billing feature launched in February, developers with high-traffic applications can purchase additional resources far beyond our original fixed free quotas. Having been live for more than a year, we now have good empirical data on the average resource consumption per request, so we're able to set our quotas to more accurately support our 5 million page views target.This change in the free quotas offered to every application is intended to allow us to continue to offer substantial free application hosting to any interested developer. We have grown a lot in the last year, with over 80,000 applications created, and with these changes to our free quotas, more than 90% of these applications will continue to serve completely free. To empirically determine reasonable levels for our quotas, we measured resource usage for all applications running on App Engine over a recent 7-day period. For each of the quotas, we took the highest daily average resource usage per HTTP request out of the 7-day period:CPU: 0.14 CPU-seconds/request
Outbound data transfer: 6149 bytes out
Inbound data transfer: 803 bytes in
Multiplied by 5 million requests spread over a 30 day month, these per-request resource statistics translate to daily resource usage of 6.4 CPU-hours and 1.02 gigabytes of outbound data transfer. We top off the numbers by offering 6.5 CPU-hours and 1.07 gigabytes of outbound transfer. Though typically inbound data transfer is a small fraction of outbound data transfer, we made inbound and outbound data transfer symmetric to ease initial data uploads.Finally — what do we mean by reasonably efficient applications? Simply put, efficient applications avoid unnecessary computation or data transfer, and two techniques common to efficient App Engine applications are the use of caching headers and memcache. Caching headers in an HTTP response prevent a user's browser from needlessly re-downloading information that hasn't changed, both speeding up the user experience and saving bandwidth. Similarly, memcache keeps frequently accessed data in a memory cache on App Engine servers, rather than always reading from disk in the Datastore, therefore saving CPU usage and Datastore load.Again, these changes ensure we can keep our continuing promise to make it free to get started with App Engine.Posted by Chris Beckmann, App Engine team

Google

10 things you (probably) didn't know about App Engine

Friday, June 5, 2009

What could be better than nine nifty tips and tricks about App Engine? Why, ten of course. As we've been participating in the discussion groups, we've noticed that some features of App Engine often go unnoticed so we've come up with just under eleven fun facts which might just change the way that you develop your app. Without further ado, bring on the first tip:1. App Versions are strings, not numbersAlthough most of the examples show the 'version' field in app.yaml and appengine-web.xml as a number, that's just a matter of convention. App versions can be any string that's allowed in a URL. For example, you could call your versions "live" and "dev", and they would be accessible at "live.latest.yourapp.appspot.com" and "dev.latest.yourapp.appspot.com".2. You can have multiple versions of your app running simultaneouslyAs we alluded to in point 1, App Engine permits you to deploy multiple versions of your app and have them running side-by-side. All the versions share the samedatastore and memcache, but they run in separate instances and have different URLs. Your 'live' version always serves off yourapp.appspot.com as well as any domains you have mapped, but all your app's versions are accessible at version.latest.yourapp.appspot.com. Multiple versions are particularly useful for testing a new release in a production environment, on real data, before making it available to all your users.Something that's less known is that the different app versions don't even have to have the same runtime! It's perfectly fine to have one version of an app using the Java runtime and another version of the same app using the Python runtime.3. The Java runtime supports any language that compiles to Java bytecodeIt's called the Java runtime, but in fact there's nothing stopping you from writing your App Engine app in any other language that compiles to JVM bytecode. In fact, there are already people writing App Engine apps in JRuby, Groovy, Scala, Rhino (a JavaScript interpreter), Quercus (a PHP interpreter/compiler), and even Jython! Our community has shared notes on what they've found to work and not work on the following wiki page.4. The 'IN' and '!=' operators generate multiple datastore queries 'under the hood'The 'IN' and '!=' operators in the Python runtime are actually implemented in the SDK and translate to multiple queries 'under the hood'.For example, the query "SELECT * FROM People WHERE name IN ('Bob', 'Jane')" gets translated into two queries, equivalent to running "SELECT * FROM People WHERE name = 'Bob'" and "SELECT * FROM People WHERE name = 'Jane'" and merging the results. Combining multiple disjunctions multiplies the number of queries needed, so the query "SELECT * FROM People WHERE name IN ('Bob', 'Jane') AND age != 25" generates a total of four queries, for each of the possible conditions (age less than or greater than 25, and name is 'Bob' or 'Jane'), then merges them together into a single result set.The upshot of this is that you should avoid using excessively large disjunctions. If you're using an inequality query, for example, and you expect only a small number of records to exactly match the condition (e.g. in the above example, you know very few people will have an age of exactly 25), it may be more efficient to execute the query without the inequality filter and exclude any returned records that don't match it yourself.5. You can batch put, get and delete operations for efficiencyEvery time you make a datastore request, such as a query or a get() operation, your app has to send the request off to the datastore, which processes the request and sends back a response. This request-response cycle takes time, and if you're doing a lot of operations one after the other, this can add up to a substantial delay in how long your users have to wait to see a result.Fortunately, there's an easy way to reduce the number of round trips: batch operations. The db.put(), db.get(), and db.delete() functions all accept lists in addition to their more usual singular invocation. When passed a list, they perform the operation on all the items in the list in a singledatastore round trip and they are executed in parallel, saving you a lot of time. For example, take a look at this common pattern:for entity in MyModel.all().filter("color =",
old_favorite).fetch(100):
entity.color = new_favorite
entity.put()Doing the update this way requires one datastore round trip for the query, plus one additional round trip for each updated entity - for a total of up to 101 round trips! In comparison, take a look at this example:updated = []
for entity in MyModel.all().filter("color =",
old_favorite).fetch(100):
entity.color = new_favorite
updated.append(entity)
db.put(updated)By adding two lines, we've reduced the number of round trips required from 101 to just 2!6. Datastore performance doesn't depend on how many entities you haveMany people ask about how the datastore will perform once they've inserted 100,000, or a million, or ten million entities. One of the datastore's major strengths is that its performance is totally independent of the number of entities your app has. So much so, in fact, that every entity for every App Engine app is stored in a singleBigTable table! Further, when it comes to queries, all the queries that you can execute natively (with the notable exception of those involving 'IN' and '!=' operators - see above) have equivalent execution cost: The cost of running a query is proportional to the number of results returned by that query.7. The time it takes to build an index isn't entirely dependent on its sizeWhen adding a new index to your app on App Engine, it sometimes takes a significant amount of time to build. People often inquire about this, citing the amount of data they have compared to the time taken. However, requests to build new indexes are actually added to a queue of indexes that need to be built, and processed by a centralized system that builds indexes for all App Engine apps. At peak times, there may be other index building jobs ahead of yours in the queue, delaying when we can start building your index.8. The value for 'Stored Data' is updated once a dayOnce a day, we run a task to recalculate the 'Stored Data' figure for your app based on your actual datastore usage at that time. In the intervening period, we update the figure with an estimate of your usage so we can give you immediate feedback on changes in your usage. This explains why many people have observed that after deleting a large number of entities, theirdatastore usage remains at previous levels for a while. For billing purposes, only the authoritative number is used, naturally.9. The order that handlers in app.yaml, web.xml, and appengine-web.xml are specified in mattersOne of the more common and subtle mistakes people make when configuring their app is to forget that handlers in the application configuration files are processed in order, from top to bottom. For example, when installing remote_api, many people do the following:handlers:
- url: /.*
script: request.py

- url: /remote_api
script: $PYTHON_LIB/google/appengine/ext/remote_api/handler.py
login: adminThe above looks fine at first glance, but because handlers are processed in order, the handler for request.py is encountered first, and all requests - even those for remote_api - get handled by request.py. Since request.py doesn't know about remote_api, it returns a 404 Not Found error. The solution is simple: Make sure that the catchall handler comes after all other handlers.The same is true for the Java runtime, with the additional constraint that all the static file handlers in appengine-web.xml are processed before any of the dynamic handlers in web.xml.10. You don't need to construct GQL strings by handOne anti-pattern that comes up a lot looks similar to this:q = db.GqlQuery("SELECT * FROM People "
"WHERE first_name = '" + first_name
+ "' AND last_name = '" + last_name + "'")As well as opening up your code to injection vulnerabilities, this practice introduces escaping issues (what if a user has an apostrophe in their name?) and potentially, encoding issues. Fortunately,GqlQuery has built in support for parameter substitution, a common technique for avoiding the need to substitute in strings in the first place. Using parameter substitution, the above query can be rephrased like this:q = db.GqlQuery("SELECT * FROM People "
"WHERE first_name = :1 "
"AND last_name = :2", first_name, last_name)GqlQuery also supports using named instead of numbered parameters, and passing a dictionary as an argument:q = db.GqlQuery("SELECT * FROM People "
"WHERE first_name = :first_name "
"AND last_name = :last_name",
first_name=first_name, last_name=last_name)Aside from cleaning up your code, this also allows for some neat optimizations. If you're going to execute the same query multiple times with different values, you can useGqlQuery .bind() to 'rebind' the values of the parameters for each query. This is faster than constructing a new query each time, because the query only has to be parsed once:q = db.GqlQuery("SELECT * FROM People "
"WHERE first_name = :first_name "
"AND last_name = :last_name")
for first, last in people:
q.bind(first, last)
person = q.get()
print personPosted by Nick Johnson, App Engine TeamJava is a trademark or registered trademark of Sun Microsystems, Inc. in the United States and other countries.