cloud-test: Google Cloud Dataproc managed Spark and Hadoop service now GA

Google Cloud Platform Blog

Google Cloud Dataproc managed Spark and Hadoop service now GA

Monday, February 22, 2016

Google Cloud DataprocApache HadoopApache Spark———entered beta last yearDevelopers ConsoleGoogle Cloud SDKGoogle BigQueryGoogle Cloud BigtableGoogle Cloud StorageGoogle Cloud Dataflowproperty tuningcluster versioningcustom machine types

Cloud Dataproc tips the scale of running Spark and Hadoop in your favor by lowering cost and complexity while increasing scalability and productivity

—costcomplexity

Low-cost. We believe two things — using Spark and Hadoop should not break the bank and that you should pay for what you actually use. As a result, Cloud Dataproc is priced at only 1 cent per virtual CPU in your cluster per hour, on top of the other Cloud Platform resources you use. Moreover, with per-minute billing and a low 10-minute minimum, you pay for what you actually use, not a rounded (up) approximation.

Speed. With Cloud Dataproc, clusters do not take 10, 15, or more minutes to start or stop. On average, Cloud Dataproc start and stop operations take 90 seconds or less. This can be a 2-10x improvement over other on-premises and IaaS solutions. As a result, you spend less time waiting on clusters and more time hands-on with data.

Management. Cloud Dataproc clusters don't require specialized administrators or software products. Cloud Dataproc clusters are built on proven Cloud Platform services, such as Google Compute Engine, Google Coud Networking, and Google Cloud Logging to increase availability while eliminating the need for complicated hands-on cluster administration. Moreover, Cloud Dataproc supports cluster versioning, giving you access to modern, tested, and stable versions of Spark and Hadoop.

—scaleproductivity

Easy. You can create, monitor, and delete Cloud Dataproc clusters and jobs directly through Google Developers Console and Cloud SDK. For more advanced use cases, you can use the Cloud Dataproc REST API with a programming language, such as Python, to programmatically interact with Cloud Dataproc without hassle.

Modern. Cloud Dataproc is frequently updated with new image versions to support new software releases from the Spark and Hadoop ecosystem. This provides access to the latest stable releases while also ensuring backward compatibility. For general availability we're releasing image version 1.0.0 with support for Hadoop 2.7.2, Spark 1.6.0, Hive 1.2.1, and Pig 0.15.0. Support for other components, such as Apache Zeppelin (incubating) are provided in our GitHub repository for initialization actions.

Integrated. Cloud Dataproc has built-in integrations with other Cloud Platform services, such as BigQuery, Cloud Storage, Cloud Bigtable, and Google Cloud Logging so you have more than just a Spark or Hadoop cluster — you have a complete data platform. You can also use Cloud Dataproc initialization actions to extend the functionality of your clusters.

partner ecosystemArimoAttunityLookerWANdiscoZoomdataMoserPythianTectonic Cloud Dataproc sitegetting started guidepredict keno outcomes with Cloud DataprocStack OverflowPosted by James Malone, Product Manager