Google Cloud Platform Blog
Connection Refused!
Friday, July 31, 2015
A fellow Technical Solutions Engineer recently found their Google Cloud Platform project in an interesting state. They could create Compute Engine VM instances that would boot, but could not remotely connect via SSH into any of them.
While t
his problem is often due to a misconfigured firewall rule, a quick check of the rules showed this was not the case, as an SSH rule existed and its
SRC_RANGES
value was non-discriminatory:
$ gcloud compute firewall-rules list -r .*ssh.*
NAME NETWORK SRC_RANGES RULES SRC_TAGS TARGET_TAGS
default-allow-ssh default 0.0.0.0/0 tcp:22
We ruled out a system-level firewall misconfiguration, as new systems from default images would not share that issue. As a sanity check, we used
tcptraceroute
to ensure traffic was reaching the instance:
$ sudo tcptraceroute -P 22 130.211.181.201
Selected device en0, address 172.31.130.174, port 22 for outgoing packets
Tracing the path to 130.211.181.201 on TCP port 80 (http), 30 hops max
1 172.31.131.252 1.247 ms 0.256 ms 0.250 ms
2 * * *
...
10 * * *
11 201.181.211.130.bc.googleusercontent.com (130.211.181.201) [closed] 38.175 ms 38.918 ms 38.072 ms
We would expect the last hop to report open, not closed. Typically, this value means that the instance has responded but the port wasn't open for communication. With no firewall interference, we knew it had to be something else. The next step was to
grep
through the serial port output to see if
sshd
had started:
$ gcloud compute instances get-serial-port-output gcp-rge0-blog --zone us-central1-a | grep Starting.*sshd
[....] Starting OpenBSD Secure Shell server: sshd
Jan 14 23:19:19 gcp-rge0-blog sshd[1911]: Server listening on 0.0.0.0 port 22.
[ ok ] Starting OpenBSD Secure Shell server: sshd.
Okay, that looked fine. With the most obvious points of interference ruled out, the network routes were the next best bet:
$ gcloud compute routes list
NAME
NETWORK DEST_RANGE NEXT_HOP PRIORITY
default-route-31a84e4cfff40b29
default 10.240.0.0/16 1000
Now we’ve found the root cause. The default route for non-local traffic
(0.0.0.0/0
) had been inadvertently deleted, which caused all external traffic to be lost on the return path. Recreating the missing route solved the issue:
$ gcloud compute routes create default-internet --destination-range 0.0.0.0/0 --next-hop-gateway default-internet-gateway
Created [https://www.googleapis.com/compute/v1/projects/PROJECTID/global/routes/default-internet].
$ gcloud compute routes list
NAME
NETWORK DEST_RANGE NEXT_HOP PRIORITY
default-route-31a84e4cfff40b29
default 10.240.0.0/16 1000
default-internet
default 0.0.0.0/0 default-internet-gateway 1000
Now, the instances are once again reachable by SSH and any other external method. Case closed!
You can find a lot of help and information in the Google Cloud Platform
documentation
and more information on troubleshooting Compute Engine specifically
here
.
- Posted by Josh Moore, Technical Solutions Engineer
Multi-million operations per second on a single Google Compute Engine instance
Thursday, July 30, 2015
The emergence of
affordable
high IOPS storage, such as Google Compute Engine local SSDs, enables a new generation of technologies to re-invent storage.
Helium
, an embedded key-value store from
Levyx
, is one such example -- designed to scale with multi-core CPUs, SSDs, and memory efficient indexing.
At Levyx, we believe in a
“scale-in before you scale-out”
mantra. Often times technology vendors advertise scale-out as a way to achieve high performance. It is a proven approach, but it is often used to mask single node inefficiencies. Without a well balanced system where CPU, memory, network, and local storage are properly balanced, this is simply what we call “throwing hardware at the problem”. Hardware that, virtual or not, customers pay for.
To demonstrate this, we decided to check Helium’s performance on a single node on Google Cloud Platform with a workload similar to the one previously used to showcase
Aerospike
and
Cassandra
(200 byte objects and 100 million operations). With
Cassandra
, the data store contained 3 billion indices. Helium starts with an empty data store. The setup consists of:
Single
n1-highcpu-32
instance -- 32 virtual CPUs and 28.8 GB memory.
Four local SSDs (4 x 375 GB) for the Helium datastore. (Note:
local-SSDs
is limited in terms of create time flexibility and reliability compared to
persistence-disks
, but the goal of this blog post is to test with highest performing GCP IO devices).
OS: Debian 7.7 (kernel 3.16-0.bpo.4-amd64, NVMe drivers).
The gists and tests are on
github
.
Scaling and Performance with CPUs
The test first populates an empty datastore followed by reading the entire datastore sequentially and then randomly. Finally, the test deletes all objects. The 100 million objects are in memory with
persistence
on SSD, which acts as the local storage every replicated system requires. The total datastore size is kept fixed.
Takeaways
Single node performance of over
4 Million
inserts/sec (write path) and over
9 Million
gets/sec (read path) with persistence that is as durable as the local SSDs.
99% (in memory) latency for updates
< 15 usec
, and
< 5 usec
for gets.
Almost linear scaling helps with the math of provisioning instances.
Scaling with SSDs and Pure SSD Performance
Compute Engine provides high IOPS, low latency
local SSDs
. To demonstrate a case where data is read purely from SSDs (and not take advantage of memory), let’s run the same benchmark with 4K object size x 5 million objects, and reduce Helium’s cache to a minimal 2% (400 MB) of total data size (20GB). Only random gets performance is shown below because it is a better stress test than sequential gets.
Take aways
:
Single node SSDs capable of updates at
1.6 GB/sec (400K IOPS)
and random gets at
1.9 GB/sec (480K IOPS)
.
IOPS scaling with SSDs.
Numbers comparable to fio, a pure IO benchmark.
With four SSDs and 256 threads, median latency
< 600 usec
, and 95% latency
< 2 msec
.
Deterministic memory usage (< 1GB) by not relying on OS page caches.
Cost Analysis
The
cost
of this Google Compute Engine instance for one hour is $1.22 (n1-highcpu-32) + $0.452 (4 x Local SSD) =
$1.67.
Based on 200-byte objects, this boils down to:
2.5 Million updates per dollar
4.6 Million gets per dollar
To put this in perspective, New York’s population is ~8.4 million; therefore, you can scan through a Helium datastore containing everyone’s record (assuming each record is less than 200 bytes. Eg: name, address and phone) in one second on a single Google Cloud Platform instance for under $2 per hour.
Summary
Helium running on Google Compute Engine
commodity
VMs enables processing data at near memory speeds using SSDs. The combination of Cloud Platform and Helium makes high throughput, low latency data processing affordable for everyone. Welcome to the era of dollar store priced datastores at enterprise grade reliability!
For details about running Helium on Google Cloud Platform, contact
info@levyx.com
.
- Posted by Siddharth Choudhuri, Principal Engineer at Levyx
Tableau, Google BigQuery, & Twitter - Visualization of Streamed Tweets at #GCPNext
Wednesday, July 29, 2015
Today's guest post comes from our friends at
Tableau
: Jeff Feng, Product Manager & Ellie Fields, Vice President of Product Marketing. Tableau, a Google Cloud Platform partner, is a leader of interactive data visualization software.
“It’s a beautiful thing when best-of-breed technologies
—
Tableau, Google BigQuery and Twitter
—
come together to operate seamlessly in concert with one another.” - Jeff Feng
Next, a Google Cloud Platform Series
Over the month of June, the
Tableau
team traveled around the world with the
Google Cloud Platform
team as a proud sponsor of
Next, a Google Cloud Platform event series
. The teams made stops in
New York
,
San Francisco
,
Tokyo
,
London
,
and
Amsterdam
where attendees learned about the latest services and features on the platform, and fellow developers and IT professionals shared how they are using Google Cloud Platform to move from idea to an application and/or decision quickly.
Ellie presented a joint demo on Twitter during the Data & Analytics Talk at Next, New York City (Left). Jeff discussed the activity of Tweets around #GCPNext in Amsterdam (Right).
Visualizing Streamed Tweets with Tableau, Google BigQuery & Twitter
As a part of our presence at the events, we wanted to develop a live demo that highlighted and showcased our technologies.
Google BigQuery
has the ability to process petabytes of data within seconds and ingest data rapidly. Tableau’s live connectivity to BigQuery enables users to create stunning dashboards within minutes with our drag-and-drop interface, extending the usefulness of BigQuery to all users. For this demo, we decided to visualize real-time Tweets from Twitter about the #GCPNext conference series.
Overall architecture for visualizing streamed Tweets in BigQuery using Tableau.
We worked together with our friends at Twitter (
@TwitterDev
) who developed an open-source connector called
Twitter-for-BigQuery
that streams Tweets directly into BigQuery. Additionally, the connector can retrieve the last 30 days of data for the defined Tweet stream. The APIs for the connector are provided by
Gnip
, which offers enterprise-grade access and filtering for the full Twitter stream. The connector enables users to define the filters for certain hashtags and usernames, and consequently streams tweets matching these filters in real time directly into BigQuery using the
Tabledata.insertAll
method. For the purposes of our demo, our Tweet stream included hashtags such as #bigdata, #IoT, and #GCPNext as well as usernames such as @Google.
Once the data lands in BigQuery’s tables, the data may be accessed using super-fast, SQL-like queries using the processing power of Google’s infrastructure. Google provides a console with a command line interface that’s great for analysts and developers who know how to write SQL. Tableau enhances the joint solution by providing a drag-and-drop visual interface to the data so that anybody can use it. Plus our live native connector to Google using the BigQuery REST API means a user can leverage our interface while optimized against Google’s massive infrastructure. Additionally,
Tableau and the Google BigQuery team have co-published a best practices whitepaper
to help you maximize the value of our joint solution.
Using Tableau Desktop, we connected to the data and built the dashboard below, enabling users to search for keywords within the filtered Tweet stream. Then we published the live data connection to BigQuery and the dashboard to Tableau Online, our hosted analytics platform. Tableau Online is the perfect compliment to BigQuery because the solution is completely no-Ops and maintenance-free. It also supports a live connection to Google BigQuery.
Not only does the dashboard show the overall number of Tweets in the stream and the percentage occurrence of the keyword by date, but you can also visualize the actual Tweet itself by hovering over the marks in the scatter plot below.
Interactive Tableau Online dashboard visualizing live streamed Tweets in Google BigQuery.
In the video below, Ellie shares how you can interact with the Tableau Online visualization we created as well as build a new visualization using the live data connection to BigQuery directly from Tableau Online.
Demo video featuring Tableau Online visualizing live streamed Tweets in Google BigQuery.
What’s Up Next?
At Tableau, we believe that the future of data is in the cloud. We love how Google is innovating on cloud infrastructure and building the cloud services of tomorrow today. That’s why we recently
announced a new named connector to Google Cloud SQL
. The connector moves Google Cloud Platform and Tableau Online customers one step closer to being able to both host and analyze data completely in the cloud. This connector also compliments our existing native connectors to Google BigQuery and Google Analytics. In the future, we are committed to building broader and deeper integrations with Google to delight our users.
Try It For Yourself!
The beautiful thing about this demo is that the technologies used in the solution are easy to use. To learn more and try it for yourself, please see the following links below:
Tableau - Learn More and Free Trial
Google BigQuery
Google BigQuery & Tableau Best Practices Whitepaper
Twitter-for-BigQuery on GitHub
- Posted by Jeff Feng, Product Manager, and Ellie Fields, VP of Product Marketing, both at Tableau.
Don't Miss Next '17
Use promo code NEXT1720 to save $300 off general admission
REGISTER NOW
Free Trial
GCP Blogs
Big Data & Machine Learning
Kubernetes
GCP Japan Blog
Labels
Announcements
56
Big Data & Machine Learning
91
Compute
156
Containers & Kubernetes
36
CRE
7
Customers
90
Developer Tools & Insights
80
Events
34
Infrastructure
24
Management Tools
39
Networking
18
Open Source
105
Partners
63
Pricing
24
Security & Identity
23
Solutions
16
Stackdriver
19
Storage & Databases
111
Weekly Roundups
16
Archive
2017
Feb
Jan
2016
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Feed
Subscribe by email
Technical questions? Check us out on
Stack Overflow
.
Subscribe to
our monthly newsletter
.
Google
on
Follow @googlecloud
Follow
Follow