Google Cloud Platform Blog
Alerting on a metric’s rate of change
Wednesday, October 14, 2015
Part of your daily routine managing your cloud services is to make sure bad things don’t happen. And when they do
–
you want to be notified as soon as possible. For that, Google Cloud Monitoring provides you with powerful alerting capabilities that notify you whenever conditions you’ve defined are triggered. For example, you can configure Cloud Monitoring to notify you when CPU utilization crosses a certain threshold, or your application serving latency is too high.
But what if you don’t know exactly which thresholds to use, or what “too high” really means? What if instead, you just want to notify when things change from their norm?
For that, we’re introducing the ability to
alert on a metric’s rate of change
. Instead of choosing a threshold for a certain metric (e.g. “notify me when it crosses 20”), you can now specify a percentage change instead (e.g. “notify me when it changes by more than 30% compared to it’s previous values”). This new alert condition type is especially useful for “stable” metrics that are expected to remain within a certain range and not change too abruptly for extended periods of time, such as latency, QPS and error ratios.
Understanding how metric rate of change conditions are evaluated
Suppose you want to be alerted when disk usage increases by over 30% in a time period of 15 minutes or less (details of how to configure this in the next section). Here’s how this condition is evaluated:
Every minute at time T, we compute a simple 10 minute moving average for the metric in question (in this case, disk usage), at times T and T minus 15 minutes.
We then check the difference between the two, and if it exceeds the percentage specified, the alert condition is triggered.
Note that the 15 minute period is used as an example and is configurable. However, the 10 minute moving average is fixed, and is used to smooth the metric and prevent false positives. Without this smoothing function, a quick spike in the metric would trigger the alert condition even if it’s very short lived and resolves itself instantly.
You can configure the increase and decrease of a metric separately, so in the above example a drop of 30% or more would NOT trigger this alert condition.
Lets look at the visual example above, using the same condition previously described. As you can see, disk usage starts rising a little after 9:30 (where the gray vertical line is), but the alert condition isn’t triggered yet since it only increases by 20%. Around 9:45 it increases some more and triggers the condition (the red horizontal line at the top of the graph).
The condition is auto resolved at around 10:10, since looking back 15 minutes doesn’t yield any increase of 30% or greater. In other words, the metric has stabilized for 15 minutes. The condition is again triggered a little after 10:30 when disk usage increases from about 40% to over 80%.
Configuring metric rate of change alerting conditions
To create a rate of change alerting condition, simply choose “Metric Rate of Change” from the “Add Condition” menu when creating or editing an alerting policy. Like other condition types, Metric Rate of Change can be combined with other conditions to create a multi-condition alerting policy.
When configuring this condition, you can choose the metric, whether to trigger on an increase or decrease, the percent of increase or decrease that would trigger the condition, and the time window to examine for change. The screenshot below shows the condition we examined in the previous example.
Note: you should take into account metric changes related to time of day, to avoid false notifications. So for example, if average QPS increases tenfold in the morning, this might unnecessarily trigger alert conditions that check for increased QPS activity.
Conclusion
Alerting on a metric’s rate of change gives you more flexibility when monitoring dynamic systems without the need to explicitly define
–
and maintain
–
static thresholds. These conditions adapt to the way your system behaves, part of our ongoing effort to provide you with a monitoring system that requires less maintenance, so you have more time to innovate. Give it a try
–
look for a stable metric that should remain within a given range, and configure a percentage increase or decrease that would signal a change that’s worth investigating. You can read more about setting up alerts
here
.
Please send us an
email
and let us know what you think. Your feedback helps us improve.
-
Posted by Amir Hermelin, Product Manager
No comments :
Post a Comment
Don't Miss Next '17
Use promo code NEXT1720 to save $300 off general admission
REGISTER NOW
Free Trial
GCP Blogs
Big Data & Machine Learning
Kubernetes
GCP Japan Blog
Labels
Announcements
56
Big Data & Machine Learning
91
Compute
156
Containers & Kubernetes
36
CRE
7
Customers
90
Developer Tools & Insights
80
Events
34
Infrastructure
24
Management Tools
39
Networking
18
Open Source
105
Partners
63
Pricing
24
Security & Identity
23
Solutions
16
Stackdriver
19
Storage & Databases
111
Weekly Roundups
16
Archive
2017
Feb
Jan
2016
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Feed
Subscribe by email
Technical questions? Check us out on
Stack Overflow
.
Subscribe to
our monthly newsletter
.
Google
on
Follow @googlecloud
Follow
Follow
No comments :
Post a Comment