A picture is worth a thousand words, and as developers, you know that a working code snippet can be worth even more. The developers at scisurfer.com have agreed to share a few of their code snippets with you today. The snippets outline how they full text index their content and make it easily searchable for their users. ...
A picture is worth a thousand words, and as developers, you know that a working code snippet can be worth even more. The developers at scisurfer.com have agreed to share a few of their code snippets with you today. The snippets outline how they full text index their content and make it easily searchable for their users.



---



Many applications can benefit from full text search. Using Brett Slatkin's presentation at Google I/O, the implementation is pretty straight-forward. The following article gives you a practical introduction of how to implement full text search on GAE. The code is GAE/J + JDO only, but the concepts can be easily converted into Python or JPA.




Goals



  • Develop a guestbook example (much like the one shipped with the SDK), but with searchable text

  • The full text search should be fuzzy, within some reasonable limits



Some things before we start:



  • Self merge-joins and list properties: You can query an entity efficiently based on list properties via self merge-joins. We will not talk about that in detail, but you should watch Brett Slatkin's excellent talk at Google I/O '09 about the topic. It should answer most of your questions: Google I/O 2009 - Building Scalable, Complex Apps on App Engine.

  • Full Text Search (FTS): FTS is a really huge topic, and it can be done in a myriad of different ways. Check out wikipedia for a primer: http://en.wikipedia.org/wiki/Full_text_search

  • The art of stemming: One of the most basic things done to enable some form of inexact search is called "stemming". It's the reduction of words towards their basic form. http://en.wikipedia.org/wiki/Stemming



The project


The whole project is available on Google Code http://code.google.com/p/guestbook-example-appengine-full-text-search.



A live demo is available at http://guestbook-example-fts.appspot.com/




The project - walk-through: indexing



  • guestbook.jsp: This file is the first file that is loaded. You can enter new entries into the guestbook (GuestBookEntry) or search all guestbook entries.

  • GuestBookEntry.java: A simple JDO file with persistent fields. However, there is one special field: fts. It is a Set of Strings. The set will be filled with the terms that allow for a full text search. If you inspect the constructor, you will see a call which is responsible for making this GuestBookEntry searchable:



    SearchJanitor.updateFTSStuffForGuestBookEntry(this);




  • SearchJanitor.java: The updateFTSStuffForGuestBookEntry method gets a GuestBookEntry and chops it into single words using:



    SearchJanitorUtils.getTokensForIndexingOrQuery(...);




  • SearchJanitorUtils.java: The getTokensForIndexingOrQuery(...) method uses Apache Lucene and Lucene Snowball to extract words from the given string. But it does more: The Lucene Snowball stemmer reduces the words to the basic form which enables fuzzy search. A search for Kids or Kid will return the same results. kid (lowercase) and Kids will also return the same results.




Indexing summary: So far, we have an entity (GuestBookEntry) that will be filled with a set of Strings generated from it's content. So far, so good. But the real search component is missing.


The project - walk-through: searching



  • search.jsp: This file gets a parameter "search" and presents results for that search. It does that by consulting:



    List searchResults =
    SearchJanitor.searchGuestBookEntries(searchString, pm);




  • SearchJanitor.java : The searchGuestBookEntries method does all the magic. It again chops the search string into single, stemmed words (using the SearchJanitorUtils) and constructs a query that searches for all these Strings in the fts field of entity GuestBookEntry.


StringBuffer queryBuffer = new StringBuffer();
queryBuffer.append("SELECT FROM " +
GuestBookEntry.class.getName() + " WHERE ");

Set queryTokens = SearchJanitorUtils
.getTokensForIndexingOrQuery(queryString,
MAXIMUM_NUMBER_OF_WORDS_TO_SEARCH);

List parametersForSearch = new ArrayList(queryTokens);

StringBuffer declareParametersBuffer = new StringBuffer();
int parameterCounter = 0;

while (parameterCouter < queryTokens.size()) {
queryBuffer.append("fts == param" + parameterCounter);
declareParametersBuffer.append("String param" + parameterCounter);

if (parameterCounter + 1 < queryTokens.size()) {
queryBuffer.append(" && ");
declareParametersBuffer.append(", ");
}

parameterCounter++;
}

Query query = pm.newQuery(queryBuffer.toString());
query.declareParameters(declareParametersBuffer.toString());
List result = (List) query
.executeWithArray(parametersForSearch.toArray());





Searching summary: We have a search.jsp that uses the same stemming as in the indexing part to translate a string into a searchable set of strings. This set of strings is then in turn queried against the datastore (in the form of self merge-joins on one field). Mission accomplished: the guestbook application can now do full text search, including alternate "fuzzy" spellings of given words.




Limitations of the approach



  • 1MB limit on entities. You cannot store more than 1MB in one entity. You can work around this limitation by generating more than one entity or by creating relation index entities as described in Brett Slatkin's presentation.

  • Number of search terms is limited (to roughly 5). You cannot search for a unlimited number of query terms, as you are doing a (potentially costly) self merge-join. But in most cases the results the user will get with a limited set of search terms will be fine.

  • If you get too many results this approach will not work. You have to make sure you are searching in a subset of the data with less than ~200 results. That's up to you. If you do not have many entities you are querying this error will never show up. If you have thousands of entries you should make sure you are only getting subsets. E.g. by only retrieving the "best" results (for whatever your application's secret sauce is). Alternatively, you can achieve this by only looking at results from a particular day or other small window.



Where to go from here




About us


This approach has been successfully applied in our GAE project http://scisurfer.com/ (scientific knowledge management). We added some secret sauce of our own for ranking, but in general, this method works (and scales) really well for us.



Please feel free to contact us about this post, or our project anytime at raphael.andre.bauer@gmail.com.



Thanks!

Nico Güttler, Dominic Jansen, Raphael Bauer (http://corporate.scisurfer.com/)





One of the many classes of application that App Engine facilitates building is online games, particularly collaborative ones. Games often start off small, but can see incredible growth rates as people convince their friends to join, and App Engine’s seamless scaling makes handling these sort of traffic spikes a breeze. Recently, we got together with Jay Kyburz, developer for the game Neptune’s Pride, and asked him a few questions about his game, and how App Engine has worked out for him.

One of the many classes of application that App Engine facilitates building is online games, particularly collaborative ones. Games often start off small, but can see incredible growth rates as people convince their friends to join, and App Engine’s seamless scaling makes handling these sort of traffic spikes a breeze. Recently, we got together with Jay Kyburz, developer for the game Neptune’s Pride, and asked him a few questions about his game, and how App Engine has worked out for him.




Q. Can you tell us a little about Neptune’s Pride?


Neptune's Pride is a real-time multi-player game of strategy, intrigue and galactic conquest. The game is played over several weeks, and players can log in at any time to upgrade their stars, dispatch their ships and most important of all, conspire with the other players.



Neptune's Pride is not like most web strategy games. The game itself is very simple, which allows players to focus on high level strategic decisions, teamwork and diplomacy.



Q. How long did it take to build your back-end on App Engine? Did you build it from scratch, or port it from another setup?


Neptune's Pride is my first App Engine game, and in fact my first web-based game. It's difficult to say exactly how long was spent developing the back-end as the game grew organically.



One thing I can say is that I spent much less time working on server code than I do working on the interface. Less than 5% of my time is spent writing code that runs on the server.



Q. Which runtime did you implement Neptune's Pride in?


Python, I love Python! Python is my language of choice for any project. Plus, the Java runtime didn't exist when I first started working on Neptune's Pride.



If there were a Javascript runtime I would consider using it, only so that my client and server code were written in the same language. What I want to know is why browsers don't have Python interpreters?



[Ed: There is a Java-based Javascript runtime available, at appenginejs.org.]



Q. What tip(s) would you offer new developers still exploring or just getting started with App Engine?


The power of App Engine is that you don't need to think about App Engine. "It just works!"



If you're a small developer like me, you don't really care much about what’s happening on the server. Some engineers want to tinker with things and know they have total control. My application’s interface is far more important.



I don't know if this is good advice, but one thing I wish I had done differently would be to have fewer, larger models.



There are a few occasions where I thought It would be a good idea to break a model up because some of the time I only need a subset of the data. I use reference properties to connect my models together so it's no work to retrieve the extra data, however it introduces additional requests, and additional points of failure. I think I would have preferred to just pull more data out of the database with fewer requests.



Q. Your core application is a game - how do you use App Engine to support what has traditionally been a client-centric type of app?


Actually, I didn't really tackle this problem very well. My game code was written as a stand alone Python application with no knowledge at all of the underlying database. Every time the user logs in or gives an order to one of their units, I just pull the whole structure out of the database, un-pickle it, process the order, and shove it back in.



This was fine for Neptune's Pride where a game can have no more that 12 players and has a finite set of data.



I have already started planning a game, more like an MMO, were all players can interact with each other and play on a "global" playing field. This game will need to be much smarter about how data is stored and retrieved from the database.



Q. What App Engine tools or APIs have been particularly useful in developing Neptune's Pride?


I don't know I can point to any one API or tool that has been particularly useful. Isn't the power of App Engine the fact that it's a one stop shop?



Q. What led you to choose App Engine as the platform to use?


Because it was easy. I know its not a very exciting thing to say on the App Engine blog, but really I chose it because I don't want to be thinking about my server, I want to be thinking about my game. App Engine allows me to do that.



Q. Are there any features that you're looking forward to, or that will be particularly useful for you?


I like the sound of background servers that run longer than 30 seconds. I have a couple of processes now that prune data out of the database, these operations are slow. Perhaps I shouldn't bother given how cheap storage is, but I like the idea of my application running forever with no human interaction.



These cleanup operation are the only things that occasionally throw Deadline Exceeded Errors.



Comet communication sounds like something I should be interested in!



Q. If you could change one aspect of App Engine, what would it be?


Another difficult question! Nothing is really getting in my way.



Google has so many great services and tools. I'd like it to be even easier to hook them all up together. If I were going to ask the App Engine team for any one feature it would be to have a look at all the other services out there and make sure it's easy to access them, then give me some code I can just copy and paste into my app :)



Q. Have you had any difficulty scaling your app as it increased in popularity?


Nope, again everything has just worked. As somebody who is new to all this I don't really even know what kinds of problems to expect as I scale, or what performance I should expect, or even the definition of a lot of users. All I can tell you is that I've seen no performance change between 2k requests and 200k requests a day.



I did hit my quota one day, and as a result the game was down for about 4-5 hours. Now I like to keep my usage under 10% of my quota so that if I do experience a spike in traffic there is plenty of room to grow.



I imagine if I grow much bigger I will need to change the application’s interface so that it’s easier for you to find a game, or join your friends in a game. I'm holding off making these changes because I'm also trying to work out the best way to build "social" tools into the game.



I want players to be able build a network of friends they enjoy playing against, then provide some tools to help them stay in touch and coordinate the creation of new games or other events like competitions.



Q. Have you used Appstats? Did you learn anything surprising about your app, and did it have an impact on your app's performance?


I did implement the stats when they first came out but didn't find anything very exciting. I simply confirmed what I already knew, which was that my game code is the slowest part of the application.



Q. Your monetization strategy is quite novel for an online game. What has the uptake and user response been like? Would you recommend it to other developers?


Actually, I just thought I was doing what everybody else is doing these days, a virtual currency. I chose it over a subscription model so players have the freedom to spend it as fast or as slow as they like.



I'm not sure I will use it on the next game. I've been considering a non renewing subscription rather than extracting credits from players every time they join a game. Pay once, forget about it, and enjoy all the sites features as a premium member.



Q. What peak QPS do you see currently? What is your latency for different types of requests?


The game peaks at about 6-8 Requests/Second. Average CPU is about 280ms (130 API).



Q. Any tips to help me overrun the interstellar empires of the rest of the App Engine team?


Know when to strike!



Today we released version 1.3.3 of the App Engine SDK for both Java and Python. This is a minor release that includes changes and a few issue fixes for the datastore, administration console, and when deploying applications. For more information on all the changes, please read the 1.3.3 release notes for Java and Python.

Today we released version 1.3.3 of the App Engine SDK for both Java and Python. This is a minor release that includes changes and a few issue fixes for the datastore, administration console, and when deploying applications. For more information on all the changes, please read the 1.3.3 release notes for Java and Python.


Additionally, the Python SDK has a new experimental feature that gives you the option to use SQLite as the datastore stub backend. Using SQLite within the dev_appserver should speed up performance of your local datastore when testing on large datasets. (Note that this feature does not add SQL support to the App Engine SDK or service.) If you try out this feature, please give us your feedback on the App Engine Python users group.


1.3.3 is now available on the App Engine download page. As always, we welcome your feedback on the App Engine group.



Today marks a special day for us on the App Engine team. It was just forty-six years ago today that IBM announced the IBM System/360. As Wikipedia puts it, “It was the first family of computers designed to cover the complete range of applications, from small to large, both commercial and scientific.” One of the unique aspects of the S/360 was that customers could start with a small system while being confident that they could upgrade their system to scale to larger workloads without having to rewrite their application code.


Today marks a special day for us on the App Engine team. It was just forty-six years ago today that IBM announced the IBM System/360. As Wikipedia puts it, “It was the first family of computers designed to cover the complete range of applications, from small to large, both commercial and scientific.” One of the unique aspects of the S/360 was that customers could start with a small system while being confident that they could upgrade their system to scale to larger workloads without having to rewrite their application code.





We’ve come a long way since the days of 7.2MB disk drives and when systems with 256KB of main memory were considered large. Customers haven’t changed all that much though. Developers still want a platform which makes it easy to build, easy to manage and easy to scale their applications. That’s exactly what inspired us to build Google App Engine.





It was two years ago today that we launched App Engine to the first 10,000 developers. Those developers formed the start of today’s vibrant community of over 250 thousand developers. Each day your apps collectively serve over 250 million pageviews. Since it’s our Birthday we thought we’d share our traffic graph with you.





It all started on April 7, 2008 with the Python runtime. And, after a somewhat false start with the FORTRAN 77 runtime, we were able to successfully launch App Engine for Java, along with a number of other exciting features, on our first anniversary.





Those of you who have followed along closely know that we didn’t stop there. We’ve kept up the pace, launching a significant new feature almost every month since then. The datastore team has added everything from key-only queries, kindless queries, ancestor queries inside of transactions and query cursors to configurable datastore deadlines, opt-in eventual consistency and a whole new way of replicating data across data centers. Meanwhile the rest of the team hasn’t skipped a beat, delivering a number of new platform capabilities including Task Queues, XMPP support, incoming email and blobstore. We even affixed our own shipment of delete buttons to the Admin Console.





This is probably a good time to call out one of our recent favorites though. It’s an instrumentation library which provides great data and insights. It comes bundled with the SDK and can be easily enabled in your application. Of course we’re talking about Appstats. If you do one thing this week to celebrate our birthday and improve the performance of your app while helping to make the web faster, you should enable Appstats. You might be surprised what you learn about your own app or you might even win a t-shirt.





Of course it’s often the little things that count: API fetch from blobstore, expanded URL fetch ports, DoS API, IPv6 support, removing the 1000 row result limit, Java unit testing framework, custom admin console pages, Java app pre-compilation, datastore stats, wildcard domains, per request statistics in HTTP response headers, SDK dataviewer and stable unique id for users to name a few. If you saw something in that list you didn’t know about, be sure to read 10 things you didn't know about App Engine, visit the ever growing list of great App Engine articles, and, while you prepare for Google I/O 2010, be sure to review the excellent and highly informative App Engine sessions from previous years. The Java developers among you are of course already reading the App Engine Persistence Blog.





We want to take this opportunity to thank you for your tremendous support. Hearing your feedback is really important to us. It helps us stay on course. Your feedback has also helped drive the list of things we’re working on.





We know many of you like the big features and the new APIs we’ve launched. We have graphs that show you’re using them. But, we often don’t get enough detailed feedback on how you’re using these APIs, whether they’re working really well, or whether there’s room for improvement. Let us know how we’re doing. We’d love to hear what your favorite APIs are and how you use them, especially if you’re doing something interesting you think others might like to hear about. Also, tell us about your favorite little feature. What’s that one thing that made your life easier?





Please don’t mind the crumbs as we enjoy some cupcakes.





I'm Reza and work in London, UK for a startup called TweetDeck. Our vision is to develop the best tools to manage and filter real time information streams like Twitter, Facebook, LinkedIn and MySpace. We offer products like our TweetDeck desktop client built on Adobe AIR, and our iPhone application. We are happy to say that we use App Engine as key part in our backend.

I'm Reza and work in London, UK for a startup called TweetDeck. Our vision is to develop the best tools to manage and filter real time information streams like Twitter, Facebook, LinkedIn and MySpace. We offer products like our TweetDeck desktop client built on Adobe AIR, and our iPhone application. We are happy to say that we use App Engine as key part in our backend.

We're a small startup, so early on we started to look for tools that would give us the biggest bang for our buck. Combined with the fact we love Python, we thought it might be worth taking a look at what Google App Engine had to offer. The first feature we started playing with was the mail sending facility.

It was easy! Sending mail was a single call-- no messing around. Combined with the fact that it was sent via tried-and-tested Google Mail servers, this meant that we had a simple mailing solution where we didn't have to deal with spam blacklists, mail retries or SPF records. We could really see App Engine being our sole email provider for transactional emails (new users and forgotten passwords), but also for newsletter-type mailing.

When we got started, our existing backend was hosted on Amazon EC2 and SimpleDB, and we knew that we needed a way for the two systems to communicate with each other. App Engine provides all the basic tools to define any sort of resources you want--but more importantly, it has the Python standard library. We implemented a small mail API, with authentication provided by an HMAC-SHA1 of the request information and a shared secret key. The API has been made extremely general: its JSON input format contains fields to send messages to blocks of email addresses that are either defined in the request itself, or which exist as a template in App Engine (templates are defined as strings in a Python module).

The whole setup currently works quite well. We're already extending our mailing system to use App Engine's task queues-- exposing a number of queues to break large mailing jobs into a series of subtasks, thus spreading mailing sending over a large period of time. We have plans to make an even tighter bridge between our EC2 systems and App Engine, which involves keeping our subscriber list entirely in App Engine, and adding and removing from that list as appropriate.

We also use App Engine for various other prototypes and smaller applications that are part of our product. We use it to serve our "TweetDeck Recommends" feed, and we've even developed small tools to apply TweetDeck fan badges on Twitter homepage backgrounds using the Imaging API! The lesson from us, of course, is that using something like App Engine doesn't have to mean everything runs on it. It's an extremely good platform for creating APIs or smaller parts of your application that do specific tasks, and do them well. Think of it as the UNIX metaphor applied to the Cloud.

We love that we've been able to grow the functionality that Tweetdeck provides by progressively using more of the cloud. App Engine provides the perfect platform to compose new services quickly, iterate on them in production, and scale with demand as Tweetdeck's install base grows. Thanks to App Engine and the cloud, there's nothing holding us back from tackling the needs of our user base.

Here is a video interview of Reza and the Tweetdeck team.