Google Cloud Platform Blog
We think Germany will win. But don’t take our word for it...
Friday, July 11, 2014
We’ve had a great time giving you our predictions for the World Cup (check out our post before the
quarter-finals
and
semi-finals
). So far, we’ve gotten 13 of 14 games correct. But this isn't about us picking winners in World Cup soccer - it’s about what you can do with Google Cloud Platform. Now, we are open-sourcing our prediction model and packaging it up so you can do your own analysis and predictions.
We used
Google Cloud Dataflow
to ingest raw, touch-by-touch gameplay data from
Opta
for thousands of soccer matches. This data goes back to the 2006 World Cup, three years of English Barclays Premier League, two seasons of Spanish La Liga, and two seasons of U.S. MLS. We then polished the raw data into predictive statistics using
Google BigQuery
.
You can see BigQuery engineer Jordan Tigani (
+JordanTigani
) and developer advocate Felipe Hoffa (
@felipehoffa
) talk about how we did it in
this video from Google I/O
.
Our prediction for the final
It’s a narrow call, but Germany has the edge: our model gives them a 55% chance of defeating Argentina due to a number of factors. Thus far in the tournament, they’ve had better passing in the attacking half of their field, a higher number of shots (64 vs. 61) and a higher number of goals scored (17 vs. 8).
But, 55% is only a small edge. And, although we've been trumpeting our 13 of 14 record, picking winners isn't exactly the same as predicting outcomes. If you'd asked us which scenario was more likely, a 7 to 1 win for Germany against Brazil or a 0 to 1 defeat of Germany by Brazil,
we wouldn't have gotten that one quite right
.
(Oh, and we think Brazil has a tiny advantage in the third place game. They may have had a disappointing defeat on Tuesday, but the numbers still look good.)
But don’t take our word for it...
Now it’s your turn to take a stab at predicting. We have provided an
IPython notebook
that shows exactly how we built our model and used it to predict matches. We had to aggregate the data that we used, so you can't compute additional statistics from the raw data. However, for the real data geeks, you could try to see how well neural networks can predict the same data or try advanced techniques like principal components analysis. Alternatively, you can try adding your own features like player salaries or team travel distance. We've only scratched the surface, and there are lots of other approaches you can take.
You might also try simulating how the USA would have done if they had beat Belgium. Or how Germany in 2014 would fare against the unstoppable Spanish team of 2010. Or you could figure out whether the USA team is getting better by simulating the 2006 team against the 2010 and 2014 teams.
Here’s how you can do it
We’ve put everything on GitHub
. You’ll find the
IPython notebook
containing all of the code (using pandas and statsmodels) to build the same machine learning models that we've used to predict the games so far. We've packaged it all up in a Docker container so that you can run your own
Google Compute Engine
instance to crunch the data. For the most up-to-date step-by-step instructions, check out the
readme on GitHub
.
-Posted by Benjamin Bechtolsheim, Product Marketing Manager
No comments :
Post a Comment
Don't Miss Next '17
Use promo code NEXT1720 to save $300 off general admission
REGISTER NOW
Free Trial
GCP Blogs
Big Data & Machine Learning
Kubernetes
GCP Japan Blog
Labels
Announcements
56
Big Data & Machine Learning
91
Compute
156
Containers & Kubernetes
36
CRE
7
Customers
90
Developer Tools & Insights
80
Events
34
Infrastructure
24
Management Tools
39
Networking
18
Open Source
105
Partners
63
Pricing
24
Security & Identity
23
Solutions
16
Stackdriver
19
Storage & Databases
111
Weekly Roundups
16
Archive
2017
Feb
Jan
2016
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Feed
Subscribe by email
Technical questions? Check us out on
Stack Overflow
.
Subscribe to
our monthly newsletter
.
Google
on
Follow @googlecloud
Follow
Follow
No comments :
Post a Comment