Google Cloud Platform Blog
Google Cloud Dataproc: Making Spark and Hadoop Easier, Faster, and Cheaper
Wednesday, September 23, 2015
Working with large datasets requires powerful tools, but too often those tools add new layers of complexity. To use your data efficiently, you need to minimize the time from data-capture to insights. But concerns about deployment, scaling, monitoring, utilization, and cost can get in the way of what matters most: your data. With more data being generated each day, you have less time to peel back the layers of complexity around the tools you rely on for success. We think using powerful data tools should be easy as 1-2-3.
Cloud Dataproc is a managed Spark and Hadoop service that lets you take advantage of open source data tools for batch processing, querying, streaming, and machine learning. Cloud Dataproc automation helps you create clusters quickly, manage them easily, and save money by turning clusters off when you don't need them. With less time and money spent on administration, you can focus on your jobs and your data. In the time it takes you to read this blog post, you can have a Spark or Hadoop cluster created, configured, and ready to work for you.
Cloud Dataproc minimizes the time you spend on administration and management
When compared to traditional, on-premises products and competing cloud services, Cloud Dataproc has a number of unique advantages for clusters of 3 to hundreds of nodes:
Low-cost
. Cloud Dataproc is priced at only 1 cent per virtual CPU in your cluster per hour, on top of the other Cloud Platform resources you use. In addition to this low price, Cloud Dataproc clusters can include
preemptible instances
that have lower compute prices, reducing your costs even further. Instead of rounding your usage up to the nearest hour, Cloud Dataproc charges you only for what you really use with minute-by-minute billing and a low, ten-minute-minimum billing period.
Super fast
. Without using Cloud Dataproc, it can take anywhere from 5 to 30 minutes to create Spark and Hadoop clusters on-premises or through IaaS providers. By comparison, Cloud Dataproc clusters are quick to start, scale, and shutdown with each of these operations taking 90 seconds or less, on average. This means you can spend less time waiting for clusters and more hands-on time working with your data.
Integrated
. Cloud Dataproc has built-in integration with other Google Cloud Platform services, such as
BigQuery
,
Google Cloud Storage
,
Google Cloud Bigtable
,
Google Cloud Logging
, and
Google Cloud Monitoring
, so you have more than just a Spark or Hadoop cluster—you have a complete data platform. For example, you can use Cloud Dataproc to effortlessly ETL terabytes of raw log data directly into BigQuery for business reporting.
Managed
. Use Spark and Hadoop clusters without the assistance of an administrator or special software. You can easily interact with clusters and Spark or Hadoop jobs through the Google Developers Console, the Google Cloud SDK, or the Cloud Dataproc REST API. When you're done with a cluster, you can simply turn it off so you don’t spend money on an idle cluster. You won’t need to worry about losing data, because Cloud Dataproc is integrated with
Cloud Storage
,
BigQuery
, and
Cloud Bigtable
.
Simple and familiar.
You don’t need to learn new tools or APIs to use Cloud Dataproc, making it easy to move existing projects into Cloud Dataproc without redevelopment. Spark, Hadoop, Pig, and Hive are frequently updated, so you can be productive faster. Today, we are launching with clusters that have Spark 1.5 and Hadoop 2.7.1.
Cloud Dataproc joins a rich set of cloud technologies focused on faster speed, robust features, and lower costs. With Cloud Platform you have access to:
Awesome infrastructure including
Google Compute Engine
,
Cloud Storage
, and
Google Cloud Networking
.
Cloud Dataproc, builds on this infrastructure to let you use Spark and Hadoop more easily, faster and at a lower cost. Since Cloud Dataproc is built on Cloud Platform, you have instant access to
solid-state drives (SSD)
and
preemptible virtual machines
.
Combining Cloud Dataproc with next-generation data processing and analytics services in Google Cloud Platform powered by Google-native technologies, including
BigQuery
,
Google Cloud Dataflow
, and
Google Cloud Pub/Sub
.
Today we’re releasing
Google Cloud Dataproc
as a beta service. Cloud Dataproc gives you anytime access to super-fast, simple yet powerful, managed Spark and Hadoop clusters. Since you only pay for what you use with minute-by-minute billing, you won’t break the bank in the process. We look forward to seeing how you find creative, innovative, and productive ways to use Cloud Dataproc. To learn more about Cloud Dataproc, visit the
Cloud Dataproc site
, review our
getting started guide
, or submit your questions and feedback on
Stack Overflow
.
- Posted by James Malone, Product Manager
No comments :
Post a Comment
Don't Miss Next '17
Use promo code NEXT1720 to save $300 off general admission
REGISTER NOW
Free Trial
GCP Blogs
Big Data & Machine Learning
Kubernetes
GCP Japan Blog
Labels
Announcements
56
Big Data & Machine Learning
91
Compute
156
Containers & Kubernetes
36
CRE
7
Customers
90
Developer Tools & Insights
80
Events
34
Infrastructure
24
Management Tools
39
Networking
18
Open Source
105
Partners
63
Pricing
24
Security & Identity
23
Solutions
16
Stackdriver
19
Storage & Databases
111
Weekly Roundups
16
Archive
2017
Feb
Jan
2016
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Feed
Subscribe by email
Technical questions? Check us out on
Stack Overflow
.
Subscribe to
our monthly newsletter
.
Google
on
Follow @googlecloud
Follow
Follow
No comments :
Post a Comment