cloud-test: Dataflow and open source - proposal to join the Apache Incubator

Google Cloud Platform Blog

Dataflow and open source - proposal to join the Apache Incubator

Wednesday, January 20, 2016

Editor's update February 9, 2016: The Dataflow submission to the Apache Incubator was accepted on February 1, 2016, and the resulting project is now called Apache Beam.Apache Hadoop MapReduceApache SparkApache FlinkDataflow Java SDKcreated one for Apache Flinkdid it for Apache SparkGoogle’s hosted Cloud Dataflow serviceClouderadata ArtisansTalendCasksent a proposal for DataflowApache Software Foundation (ASF)Incubator projectGoogle Cloud Dataflow

Pipeline first, runtime second – With the Dataflow model and SDKs, you focus first on defining your data pipelines, not how they'll run or the characteristics of the particular runner executing them.

Portability – Data pipelines are portable across a number of runtime engines. You can choose a runtime based on any number of considerations, such as performance, cost or scalability.

Unified model – Batch and streaming are integrated into a unified model with powerful semantics, such as windowing, ordering and triggering.

Development tooling – The Dataflow SDK contains the tools you need to create portable data pipelines quickly and easily using open-source languages, libraries and tools.