Google Cloud Platform Blog
Google Announces Open-Source Cloud Dataflow SDK for Java
Thursday, December 18, 2014
The value of data lies in analysis -- and the intelligence one generates from it. Turning data into intelligence can be very challenging as data sets become large and distributed across disparate storage systems. Add to that the increasing demand for real-time analytics, and the barriers to extracting value from data sets becomes a huge challenge for developers.
In June 2014, we announced a significant step toward a managed service model for data processing. Aimed at relieving operational burden and enabling developers to focus on development,
Google Cloud Dataflow
was unveiled. We created Cloud Dataflow, which is now currently an alpha release, as a platform to democratize large scale data processing by enabling easier and more scalable access to data for data scientists, data analysts and data-centric developers. Regardless of role or goal - users can discover meaningful results from their data via simple and intuitive programing concepts, without the extra noise from managing distributed systems.
Today, we are announcing availability of the
Cloud Dataflow SDK
as open-source. This will make it easier for developers to integrate with our managed service while also forming the basis for porting Cloud Dataflow to other languages and execution environments.
We’ve learned a lot about how to turn data into intelligence as the original
FlumeJava
programming models (basis for Cloud Dataflow) have continued to evolve internally at Google. Why share this via open source? It’s so that the developer community can:
Spur future innovation in combining stream and batch based processing models:
Reusable programming patterns are a key enabler of developer efficiency. The Cloud Dataflow SDK introduces a unified model for batch and stream data processing. Our approach to temporal based aggregations provides a
rich set of windowing primitives
allowing the same computations to be used with batch or stream based data sources. We will continue to innovate on new programming primitives and welcome the community to participate in this process.
Adapt the Dataflow programming model to other languages:
As the proliferation of data grows, so do programming languages and patterns. We are currently building a Python 3 version of the SDK, to give developers even more choice and to make dataflow accessible to more applications.
Execute Dataflow on other service environments:
Modern development - especially in the cloud - is about heterogeneous service and composition. Although we are building a massively scalable, highly reliable, strongly consistent managed service for Dataflow execution, we also embrace portability. As Storm, Spark, and the greater Hadoop family continue to mature - developers are challenged with bifurcated programming models. We hope to relieve developer fatigue and enable choice in deployment platforms by supporting execution and service portability.
We look forward to collaboratively building a system that enables distributed data processing for users from all backgrounds. We encourage developers to check out the
Dataflow SDK for Java on GitHub
and contribute to the community.
Interested in adding to the Cloud Dataflow conversation? Here’s how:
Apply for access
to Cloud Dataflow's managed service
Learn more
through the documentation
Take part in the conversation at StackOverflow [tag:
google-cloud-dataflow
]
- Posted by Sam McVeety, Software Engineer
No comments :
Post a Comment
Don't Miss Next '17
Use promo code NEXT1720 to save $300 off general admission
REGISTER NOW
Free Trial
GCP Blogs
Big Data & Machine Learning
Kubernetes
GCP Japan Blog
Labels
Announcements
56
Big Data & Machine Learning
91
Compute
156
Containers & Kubernetes
36
CRE
7
Customers
90
Developer Tools & Insights
80
Events
34
Infrastructure
24
Management Tools
39
Networking
18
Open Source
105
Partners
63
Pricing
24
Security & Identity
23
Solutions
16
Stackdriver
19
Storage & Databases
111
Weekly Roundups
16
Archive
2017
Feb
Jan
2016
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Feed
Subscribe by email
Technical questions? Check us out on
Stack Overflow
.
Subscribe to
our monthly newsletter
.
Google
on
Follow @googlecloud
Follow
Follow
No comments :
Post a Comment