Reducing ranking latency from one hour to five seconds using Cloud Datastore
Thursday, July 31, 2014
We recently published a case study, Fast and Reliable Ranking in Datastore, that describes how we helped one of our Google App Engine customers shorten their ranking latency from one hour to five seconds. They applied unique design patterns such as job aggregation to achieve over 300 updates per second with strong consistency on Cloud Datastore. The following are highlights from the article. ...Read More
We recently published a case study, Fast and Reliable Ranking in Datastore, that describes how we helped one of our Google App Engine customers shorten their ranking latency from one hour to five seconds. They applied unique design patterns such as job aggregation to achieve over 300 updates per second with strong consistency on Cloud Datastore. The following are highlights from the article.
The problem of ranking
Tomoaki Suzuki, an App Engine lead engineer at Applibot, a major game studio in Japan, has been trying to solve the common, yet difficult problem faced by every large gaming service: ranking.
The requirements are simple:
Getting a rank is easy, if it's not expected to also be scalable and fast. For example, you could execute the following query:
This query counts all the players who have a higher score than yours. But do you want to execute this query for every request from the portal page? How long would it take when you have a million players?
Tomoaki initially implemented this approach, but it took a few seconds to get each response. This was too slow, too expensive, and performed progressively worse as scale increased.
Next, Tomoaki tried to maintain ranking data in Memcache. This was fast, but not reliable, because Memcache entries are just caches and could be evicted at any time. With a ranking service that depended solely on in-memory-key-values, it was difficult to maintain consistency and availability.
Looking for an O(log n) Algorithm
I was assigned to Applibot under a platinum support contract. I knew that ranking was a classic and yet hard-to-solve problem for any scalable distributed service. The simple query solution requires scanning all players with a higher score to count the rank of one player. The time complexity of this algorithm is O(n); that is, the time required for query execution increases proportionally to the number of players. In practice, this means that the algorithm is not scalable. Instead, we need an O(log n) or faster algorithm, where the time will only increase logarithmically as the number of players grows.
If you ever took a computer science course, you may remember that tree algorithms, such as binary trees, red-black trees, or B-Trees, can perform at O(log n) time complexity for finding an element. Tree algorithms can also be used to calculate an aggregate value of a range of elements, such as count, max/min, and average by holding the aggregated values on each branch node. Using this technique, it is possible to implement a ranking algorithm with O(log n) performance.
I found an open source implementation of a tree-based ranking algorithm for Datastore, written by a Google engineer: the Google Code Jam Ranking Library.
Concurrent Updates Limit Scalability
However, during load testing, I found a critical limitation with the Code Jam ranking library. Its scalability in terms of update throughput was quite low. When he increased the load to three updates per second, the library started to return transaction retry errors. It was obvious that the library could not satisfy Applibot's requirement for 300 updates per second. It could handle only about 1% of that throughput.
Why is that? The reason is the cost of maintaining the consistency of the tree. In Datastore, you must use an entity group to assure strong consistency when updating multiple entities in a transaction—see "Balancing Strong and Eventual Consistency with Google Cloud Datastore". The Code Jam ranking library uses a single entity group to hold the entire tree to ensure consistency of the counts in the tree elements.
However, an entity group in Datastore has a performance limitation. Datastore only supports about one transaction per second on an entity group. Furthermore, if the same entity group is modified in concurrent transactions, they are likely to fail and must be retried. The Code Jam ranking library is strongly consistent, transactional, and fairly fast, but it does not support a high volume of concurrent updates.
Datastore Team's Solution: Job Aggregation
I remembered that a software engineer on the Datastore team had mentioned a technique to obtain much higher throughput than one update per second on an entity group. This could be achieved by aggregating a batch of updates into one transaction, rather than executing each update as a separate transaction. So Kaz asked the Datastore team for a solution for this problem.
In response to my request, the Datastore team started discussing this issue and advised us to consider using Job Aggregation, one of the design patterns used with Megastore, the underlying storage layer of Datastore, that manages the consistency and transactionality of entity groups. The basic idea of Job Aggregation is to use a single thread to process a batch of updates. Because there is only one thread and only one transaction open on the entity group, there are no transaction failures due to concurrent updates. You can find similar ideas in other storage products such as VoltDb and Redis.
Proposed Solution Runs at 300 Updates per Second Sustained
Based on the advice from the Datastore team, I wrote Proof of Concept (PoC) code that combines the Job Aggregation pattern with the Code Jam ranking library. The PoC creates a pull queue, which is a kind of Task Queue in App Engine that allows developers to implement one or multiple workers that consume the tasks added to the queue. The backend instance has a single thread in an infinite loop that keeps pulling as many tasks as possible (up to 1000) from the queue. The thread passes each update request to the Code Jam ranking library, which executes them as a batch in a single transaction. The transaction may be open for a second or more, but because there is a single thread driving the library and Datastore, there is no contention and no concurrent modification problem.
The following figure shows the load testing result of the final PoC implementation. Kaz also incorporated another design pattern, Queue Sharding, to effectively minimize the performance fluctuations in each task queue. With the final proposed solution, it can sustain 300 updates per second over several hours. Under usual load, each update is applied to Datastore within a few seconds of receiving the request.
With the load testing results and the PoC code, I presented the solution to Tomoaki and other Applibot engineers. Tomoaki plans to incorporate the solution in their production system, expects to reduce the latency of updating the ranking info from one hour to five seconds, and hopes to dramatically improve the user experience.
-Posted by Kazunori Sato, Solutions Architect
Notes
Any performance figures described in this article are sampled values for reference and do not guarantee any absolute performance of App Engine, Datastore, or other services.
The problem of ranking
Tomoaki Suzuki, an App Engine lead engineer at Applibot, a major game studio in Japan, has been trying to solve the common, yet difficult problem faced by every large gaming service: ranking.
Tomoaki Suzuki, App Engine lead engineer at Applibot, Inc. and their game Legend of Criptids (#1 ranked game in the Apple App Store North America gaming category in October 2012) |
The requirements are simple:
- Your game has hundreds of thousands (or more!) players.
- Whenever a player fights enemies (or performs other activities), their score changes.
- You want to show the latest ranking for the player on a web portal page.
Getting a rank is easy, if it's not expected to also be scalable and fast. For example, you could execute the following query:
SELECT count(key) FROM Players WHERE Score > YourScore
This query counts all the players who have a higher score than yours. But do you want to execute this query for every request from the portal page? How long would it take when you have a million players?
Tomoaki initially implemented this approach, but it took a few seconds to get each response. This was too slow, too expensive, and performed progressively worse as scale increased.
The easiest way: scan all players |
Next, Tomoaki tried to maintain ranking data in Memcache. This was fast, but not reliable, because Memcache entries are just caches and could be evicted at any time. With a ranking service that depended solely on in-memory-key-values, it was difficult to maintain consistency and availability.
Looking for an O(log n) Algorithm
I was assigned to Applibot under a platinum support contract. I knew that ranking was a classic and yet hard-to-solve problem for any scalable distributed service. The simple query solution requires scanning all players with a higher score to count the rank of one player. The time complexity of this algorithm is O(n); that is, the time required for query execution increases proportionally to the number of players. In practice, this means that the algorithm is not scalable. Instead, we need an O(log n) or faster algorithm, where the time will only increase logarithmically as the number of players grows.
If you ever took a computer science course, you may remember that tree algorithms, such as binary trees, red-black trees, or B-Trees, can perform at O(log n) time complexity for finding an element. Tree algorithms can also be used to calculate an aggregate value of a range of elements, such as count, max/min, and average by holding the aggregated values on each branch node. Using this technique, it is possible to implement a ranking algorithm with O(log n) performance.
I found an open source implementation of a tree-based ranking algorithm for Datastore, written by a Google engineer: the Google Code Jam Ranking Library.
Getting the rank of a score in a tertiary tree with google Code Jam Ranking Library |
Concurrent Updates Limit Scalability
However, during load testing, I found a critical limitation with the Code Jam ranking library. Its scalability in terms of update throughput was quite low. When he increased the load to three updates per second, the library started to return transaction retry errors. It was obvious that the library could not satisfy Applibot's requirement for 300 updates per second. It could handle only about 1% of that throughput.
Why is that? The reason is the cost of maintaining the consistency of the tree. In Datastore, you must use an entity group to assure strong consistency when updating multiple entities in a transaction—see "Balancing Strong and Eventual Consistency with Google Cloud Datastore". The Code Jam ranking library uses a single entity group to hold the entire tree to ensure consistency of the counts in the tree elements.
However, an entity group in Datastore has a performance limitation. Datastore only supports about one transaction per second on an entity group. Furthermore, if the same entity group is modified in concurrent transactions, they are likely to fail and must be retried. The Code Jam ranking library is strongly consistent, transactional, and fairly fast, but it does not support a high volume of concurrent updates.
Datastore Team's Solution: Job Aggregation
I remembered that a software engineer on the Datastore team had mentioned a technique to obtain much higher throughput than one update per second on an entity group. This could be achieved by aggregating a batch of updates into one transaction, rather than executing each update as a separate transaction. So Kaz asked the Datastore team for a solution for this problem.
In response to my request, the Datastore team started discussing this issue and advised us to consider using Job Aggregation, one of the design patterns used with Megastore, the underlying storage layer of Datastore, that manages the consistency and transactionality of entity groups. The basic idea of Job Aggregation is to use a single thread to process a batch of updates. Because there is only one thread and only one transaction open on the entity group, there are no transaction failures due to concurrent updates. You can find similar ideas in other storage products such as VoltDb and Redis.
Proposed Solution Runs at 300 Updates per Second Sustained
Based on the advice from the Datastore team, I wrote Proof of Concept (PoC) code that combines the Job Aggregation pattern with the Code Jam ranking library. The PoC creates a pull queue, which is a kind of Task Queue in App Engine that allows developers to implement one or multiple workers that consume the tasks added to the queue. The backend instance has a single thread in an infinite loop that keeps pulling as many tasks as possible (up to 1000) from the queue. The thread passes each update request to the Code Jam ranking library, which executes them as a batch in a single transaction. The transaction may be open for a second or more, but because there is a single thread driving the library and Datastore, there is no contention and no concurrent modification problem.
The following figure shows the load testing result of the final PoC implementation. Kaz also incorporated another design pattern, Queue Sharding, to effectively minimize the performance fluctuations in each task queue. With the final proposed solution, it can sustain 300 updates per second over several hours. Under usual load, each update is applied to Datastore within a few seconds of receiving the request.
Performance graph of the solution |
With the load testing results and the PoC code, I presented the solution to Tomoaki and other Applibot engineers. Tomoaki plans to incorporate the solution in their production system, expects to reduce the latency of updating the ranking info from one hour to five seconds, and hopes to dramatically improve the user experience.
-Posted by Kazunori Sato, Solutions Architect
Notes
Any performance figures described in this article are sampled values for reference and do not guarantee any absolute performance of App Engine, Datastore, or other services.