cloud-test: August 2015

Understanding Cloud Pricing Part 5 - NoSQL Databases

Monday, August 31, 2015

We’ve had a lot of great responses and feedback (keep ‘em coming!) about our cloud pricing posts (Local SSDs, Virtual Machines, Data Warehouses) and today we’re back to talk about running NoSQL databases in the cloud. Specifically, we want to give you the information you need to understand how to estimate the cost of running NoSQL workloads on Google Cloud Platform.

NoSQL Databases
The NoSQL database market has experienced massive growth for the last few years and NoSQL databases have been instrumental in solving many distributed data and scaling challenges, which have opened the door for new and innovative applications and solutions. “NoSQL” is an umbrella term that encompasses any data store that fits the notion of “not only SQL” and many products offer a high degree of tunability around the standard relational database concepts of atomicity, consistency, isolation, and durability (see ACID for more information) and the distributed systems concepts of consistency, availability, and partition tolerance (see CAP theorem for more information). And every NoSQL database offers something different when it comes to how data is modeled and stored - including, but not limited to - JSON document, key-value, wide-column, and blob storage.

As expected, there are several different self-managed options available such as MongoDB, Apache Cassandra, Riak, Apache CouchDB, Couchbase and many more. Today we’re going to focus on how to estimate pricing when running MongoDB. MongoDB is a document-based, highly-scalable NoSQL database that provides dynamic JSON schemas along with a powerful query language. There are a variety of use cases for MongoDB such as, 360-degree view of the customer, real-time analytics, internet of things applications, and content management (to name a few).

However, when looking at the pricing data for MongoDB, we noticed something interesting. We had planned a separate blog post to talk about pricing Cassandra on Google Cloud Platform as well. But the hardware (virtual or real) requirements are very similar and neither require a license to be purchased, so the costs are very similar. It didn’t make sense to have another post stating more or less the same thing, just replacing the name of the database so we are going to include Cassandra here as well.

Cassandra, unlike MongoDB, is a key-value store. Cassandra was written at Facebook with much of the data model inspired by Google's Bigtable white paper and the availability design inspired by Amazon's Dynamo white paper. Cassandra was designed for high availability, performance, and tunable consistency. Cassandra has no leader or master node, but rather all the nodes in a cluster exist in a ring, where data is replicated a configurable number of times. Availability comes from having a headless cluster storing your data; tunable consistency comes from how much effort you want your cluster to spend to return your queries. Cassandra and MongoDB are two of the most used NoSQL databases that we see our customers using.

Starting Point
So how do you estimate pricing given multiple use cases and different possible query and traffic patterns? To get started with MongoDB, we’re going to narrow the scope a bit and estimate the costs of the resources used in existing benchmarks. There are several benchmarks that have been published about MongoDB performance and we’ll focus in on two of them, one published by MongoDB and another from United Software Associates. Both benchmarks reach roughly the same throughput and latency conclusions so this is a reasonable model to build upon.

While the benchmarks from United Software Associates used a single MongoDB node for testing, the benchmarks published by MongoDB used a 3-node replica set. Replica sets are a redundant, highly-available deployment of MongoDB and they are strongly recommended for all production workloads (at a minimum). The smallest possible replica set is comprised of three nodes, each configured with matching specifications so we’ll include that configuration in our pricing breakdown below. The on-prem reference hardware specs used in the benchmarks were as follows (MongoDB, like most databases, tends to favor more RAM and storage IOPS where possible):

Benchmark	MongoDB	United Software Associates
CPU	Dual 10-core Xeon 3.0 GHz	Dual 6-core Xeon 3.06 GHz
RAM	128 GB	96 GB
Storage	2 x 960 GB SSD	2 x 960 GB SSD
Monthly Price (single node)	$1,525.00* (estimate)	Unavailable**
Monthly Price (3-node replica set)	$4,575.00* (estimate)	Unavailable**

Now if we map that back to Google Compute Engine instances and storage offerings we would have the following 2 closely matching configurations along with pricing:

Instance Type	n1-highmem-16	n1-standard-32
CPU	16 Xeon vCPU	32 Xeon vCPU
RAM	104 GB	120 GB
Storage	4 x 375 GB Local SSD	4 x 375 GB Local SSD
Monthly Price (single node)	$843.60	$1,146.10
Monthly Price (3-node replica set)	$2,530.76 (estimate)	$3,438.30 (estimate)
Monthly Price Difference	44%	24%
Annual Savings vs. On-Premise	$24,530.88	$13,640.40

The cost breakdown above shows the pricing for a single node and for a 3-node replica set, which is a typical production deployment of MongoDB as stated above. We selected Local SSD for the storage layer in order to support the IOPS required for the throughput metrics achieved in the benchmark reports. As shown in this disk type comparison, Local SSD can support up to 280,000 write IOPS per instance. We know that Local SSD is ephemeral storage, meaning that its lifecycle is tied to the virtual machine to which it is mounted, which is another reason why we chose to estimate pricing for the highly available MongoDB 3-node replica set option. Finally, the prices shown above include Google Cloud Platform sustained use discounts which totals about a 30% discount over the course of the month.

The pricing for Cassandra is pretty similar to MongoDB. They both benefit from Local SSD in terms of performance. And the trade-off between more memory (n1-highmem-16) and more compute (n1-standard-32) is the type of choice that DBAs will have to make when designing a typical Cassandra cluster. Of course, this is just guidance on pricing to get you started, you won't know what's best for your application until you actually run tests yourself.

Running Your Own Tests
As with any benchmarks, your mileage may vary when testing your particular workloads. Isolated tests run during benchmarks don’t always equate to real world performance so it is important that you run your own tests and assess read-write performance for a workload that closely matches your usage. Take a look at PerfKit and use to it to profile your own proposed deployments, including mixing and matching workloads or worker counts.

Pricing NoSQL workloads can be somewhat challenging but hopefully we’ve given you a way to get started in estimating your costs. If you’re interested in learning more about compute and storage on Google Cloud Platform, check out Google Compute Engine or take a look at the documentation. Feedback is always welcome so if you’ve got comments or questions, don’t hesitate to let us know in the comments.

We’ve gotten a lot of great feedback about this post, and we wanted to let you know that we will also be posting about cloud pricing for Google Cloud Platform's managed NoSQL options in the near future. In forthcoming blog posts, we’ll talk about how to understand the pricing around Google Cloud Bigtable and Google Cloud Datastore and compare those to other popular managed offerings. Thanks for the questions and comments, keep ‘em coming!

- Posted by Sandeep Parikh and Peter-Mark Verwoerd, Solutions Architects

* - Price was taken from a configure-to-order bare metal server at Softlayer

** - Configuration was unavailable to estimate the monthly price

Google Cloud Storage now available through VMware vCloud Air

Monday, August 31, 2015

Earlier this year, we teamed up with VMware to offer enterprise grade Google Cloud Platform services to VMware customers through VMware vCloud Air. Today we are excited to announce that vCloud Air Object Storage Service, powered by Google Cloud Platform, is generally available to all customers.

With the availability of Google Cloud Storage through vCloud Air, VMware customers will have access to a durable and highly available object storage service powered by Google Cloud Platform. Google Cloud Storage enables enterprises to store data on Google's infrastructure with very high reliability, performance and availability. It provides a simple HTTP-based API accessible from applications written in any modern programming language, which enables customers to take advantage of Google's own reliable and fast networking infrastructure to perform data operations in a cost effective manner. When you need to expand, you benefit from the scalability provided by Google's infrastructure.

VMware customers will have access to all three classes of object storage offered by Google:

Standard storage offers our highest performance storage, with very high availability.

Durable Reduced Availability storage provides a lower cost option that doesn’t require immediate and uninterrupted access to storage. Cost savings are made by reducing replicas. Durable Reduced Availability storage offers the same durability as Standard storage.

Nearline storage, our newest storage service, offers customers a simple, low-cost, fast-response storage service with quick data backup, and access for storage charges of 1 cent per GB of data.

Today’s announcement marks the launch of the first of many Google Cloud Platform services that will be offered to VMware customers through vCloud Air. We’re excited to extend Google Cloud Platform to the VMware vCloud Air customer base.

To learn more, contact your VMware sales team or Google Cloud Platform Sales.

- Posted by Adam Massey - Director, Global Partner Business

Help us build a better Google Cloud Platform

Friday, August 28, 2015

Google Cloud Platform improves as a result of extensive collaboration--including collaboration with users. In particular, user research studies help us improve our cloud platform by allowing us to get feedback directly from cloud and IT administrators around the world.

We’d like to invite you today to join our growing pool of critical contributors. Simply fill out our form and we’ll get in touch as user research study opportunities arise.

During a study, we may present you with and gather your feedback on Google Cloud Platform, a new feature we’re developing, or even prototypes. We may also interview you about particular daily habits or ask you to keep a log of certain activity types over a given period of time. Study sessions can happen at a Google office, in your home or business, or online through your computer or mobile device:

Usability study at a Google office: for those that live local to one of our offices. Typically, you’ll come visit us and meet 1-on-1 with a Google researcher. They’ll ask you some questions, have you use a product, and then gather your feedback on it. The product could be something you’re rather familiar with or some never-before-seen prototype.

Remote usability study: Rather than have you visit our offices, a Google researcher will harness the power of the Internet to conduct the study. Basically, they’ll call you on the phone and set up a screen sharing session with you on your own computer. You can be almost anywhere in the world, but need to have a high-speed Internet connection.

Field study: Google researchers hit the road and come visit you. We won't just show up at your door though – we’ll always check in with you first, talk to you about the details of the study and make a proper appointment.

Experiential sampling study: These studies would require a small amount of activity every day over the course of several days or weeks. Google researchers will ask you to respond to questions about a product, or make entries in a diary document about your use of a product, using your mobile phone, tablet, or laptop to complete the study questions or activities.

After the study, you'll receive a token of our appreciation for your cooperation, such as a gift card. Sharing your experiences with us helps inform our product planning and moves us closer to our goal of building a cloud platform that you'll love.

More questions? Check out our FAQs page to learn more about our user research studies.

- Posted by Google UX Research Infrastructure Team

Stress Testing with Energyworx

Friday, August 28, 2015

Founded in 2012, Energyworx offers big data aggregation and analytics cloud-software services for the energy and utilities industry. Their products and services include grid optimization and reliability, meter-data management, consumer engagement, energy trading and environmental-impact reduction. They are based in the Netherlands. To learn more, visit www.energyworx.org

Getting all cloudy gives you a tremendous amount: Agility, scalability, cost savings and more. The scales weigh heavily in favor of embracing cloud goodness. However, on the other side of that scale, getting all cloudy means giving up a degree of control. You don’t control the infrastructure and, in certain cases, you don’t know the implementation behind APIs you rely on. This is especially true of managed services such as databases and message queues, and those APIs and associated SLAs are central to the operation of your systems. There’s nothing surprising, bad or wrong about this situation, as stated previously there are far more pros than cons with the cloud, but as engineers whose reputation (and need for a night’s sleep uninterrupted by a 3am wake up call) rely on the stability and scalability of the systems we build, what do we do? We follow the age old maxim, trust but verify, and verify by testing!

Testing comes in many forms but broadly there are two types, functional and stress testing. Functional tests check for correctness. When I register for your service does my email address get encrypted and correctly persisted? Stress tests check for robustness. Does your service handle 100,000 users registering in the fifteen minutes after it’s mentioned in the news? As an aside, I was tempted as I wrote this post to phrase everything in terms of “we all know this…” and “of course we all do that..” when it comes to testing because we do all know it’s a good thing to do and we all do it to one extent or another but the number of issues good engineers face with scalability issues is proof that the importance of stress testing isn’t a universally held truth, or at least a universally practiced truth. The remainder of this post focuses on a set of best practices we distilled from a stress testing exercise we did in Google Cloud Platform with Energyworx as part of their go live.

Energyworx and Google Cloud Platform leveraged existing Energyworx REST APIs together with Grinder to stress test the system. Grinder allows the calls to the REST APIs to be scaled up and down as required depending on the type and degree of stress to be applied. Test scenarios were based around scaling the number of smart meters uploading data, the amount of work performed by the meters and physical locations of the meters. For example, we knew a single meter worked correctly so let’s try several hundred thousand meters working at the same time, or let’s have a meters running Europe accessing the system in the US, or let’s have thousands of meters do an end of day upload at the same time. Following these best practices Energyworx ran extended 200 core tests for approximately $10 a time and proved that their system was ready for millions of meters flooding the grid daily with billions of values. We were right and Energyworx launch went off without a hitch. Stress testing is a blast…

First best practice is to leverage Google Cloud Platform to provide the resources to stress test. To simulate hundreds of thousands of smart meters (or users, or game sessions, or other stimuli) takes resources and Google Cloud Platform allows you to spin these up on demand, in very little time and pay by the minute for them. That’s a great deal for stress testing.

Second best practice is that systems are often complex, with different tiers and services interacting and it can be tough to predict how they will behave under stress, so use stress testing to probe the behavior of your system and the infrastructure and services your system relies upon. Be creative with your scenarios and you’ll learn a lot about your system’s behavior.

Third best practice is that you should test the rate of change of the load you apply as well as the maximum load. What that means is that it’s great to know your system can handle a load of 100K transactions per second but it’s still not a useful system if it can only handle these in batches of 10K increases each minute for 10 minutes when a single news article from the right expert can bring you that much traffic in the web equivalent of the blink of an eye.

Fourth best practice is that you should test regularly. If you release each Friday and bugfix on demand, you don’t need to stress test every time you release but you should stress test the entire system every 2-4 weeks to ensure that performance is not degrading over time.

- Posted by Corrie Elston, Solutions Architect

Reselling Option now available for Google Cloud Platform Partners

Wednesday, August 26, 2015

From bringing people together at the World Cup, to improving the way employees talk to each other, Google Cloud Platform Services Partners help customers unlock the full potential of our products.

To help our partners focus more on their customers’ experiences, we are pleased to announce that we’re now accepting applications for a reselling option from eligible, existing Google Cloud Platform services partners and we anticipate expanding to new partner program applicants in early fall.

As a reseller of Google Cloud Platform, partners will be able to provision and manage their customers via the new Cloud Platform reseller console. Google Cloud Platform resellers will:

Fully manage their customers’ Google Cloud Platform experience, from onboarding through implementation

Provide the first line of support and be responsible for customer problem resolution

Provide customers with a billing service that matches their specific requirements and in local currency

The ability to resell will be especially beneficial to partners aiming to bundle multiple Cloud Platform services and present one consolidated bill to their customers.

“The reseller console showcases deep insights into our customers' engagement with the platform, allowing us to make informed recommendations in terms of best practices and opportunities available to our customers. As a trusted solutions partner, it's paramount for us to provide white glove services to make their transition to the cloud as seamless as possible."

-- Tony Safoian, Sada Systems CEO

If you’re an existing services partner and want to learn more about your organization's eligibility for reselling, visit our application page on Google for Work Connect. And if you’re new to Google Cloud Platform and interested in becoming a services partner, visit our site at cloud.google.com/partners.

- Posted by Adam Massey - Director, Global Partner Business

Google Cloud Platform Blog

Understanding Cloud Pricing Part 5 - NoSQL Databases

Google Cloud Storage now available through VMware vCloud Air

Help us build a better Google Cloud Platform

Stress Testing with Energyworx

Reselling Option now available for Google Cloud Platform Partners

Don't Miss Next '17

Free Trial

GCP Blogs

Labels

Archive

Feed

Subscribe by email

Company-wide

Products

Developers