Why APM Synthetic Monitoring Isn’t Enough to Protect Your Revenue
When synthetic monitoring was first invented in the mid-1990s, it was a revolutionary way to ensure that your digital properties were performing as expected for your end users. Back then, synthetic testing was mostly used for availability testing of your external websites and services. The majority of the monitoring budget was spent on infrastructure, network, and applications. End-user monitoring was merely a luxury and an afterthought.
This worked well at the time because the majority of your digital architecture was managed in-house, and the customer-facing product was far simpler. Synthetic tests of first-party components were therefore able to uncover almost all issues that could be acted upon immediately to repair the problem.
That, however, is no longer the case.
Complexity within our digital architecture has exploded in recent years with the advent of cloud infrastructure, content delivery networks (CDN), DNS providers, third-party API services, traffic steering services, third-party tags and tag management systems, and a number of other component types that can represent performance bottlenecks and/or single points of failure.
As a result, a monitoring tool like an Application Performance Monitoring (APM) platform, which provides code-level monitoring and tracing, is insufficient for providing data from all the different layers of your digital delivery because there are so many other places where the problem may lie. While many of these providers may offer a “synthetic” solution, the actual telemetry that they provide only covers a small percentage of the potential root causes.
Additionally, these APM synthetic solutions are increasingly only hosting their agents on cloud providers like AWS, Azure, and Google Cloud, rather than on the Internet backbone infrastructure and consumer ISP networks that end users actually use to access your site, application, and services. As a result, these solutions leave you completely blind to issues that originate outside of the cloud providers, which not only hampers your ability to respond to and repair an issue in a timely manner, but also leaves you without any data to enforce your Service Level Agreements (SLA).
What’s needed is a unified platform that combines a TRUE synthetic monitoring tool (i.e. one that actually emulates the end-user experience) with real user monitoring (RUM), internet intelligence, and SaaS monitoring. Together, these tools create a comprehensive digital experience monitoring (DEM) platform that covers nearly every aspect of your digital footprint and can be deployed alongside your code-based APM solution.
While APM is a valuable tool for discovering issues within your own code, by not testing from the end user’s perspective – i.e. directly from backbone, broadband, last mile, and wireless networks – you are guaranteed to miss critical issues (false negatives) that are impacting your end users’ experiences, causing you to lose both revenue and brand credibility when those users inevitably grow frustrated with unacknowledged service disruptions.
Given that a true synthetic solution is intended to emulate the experience of actual end users, the inability of APM vendors like Dynatrace, NewRelic, and AppDynamics to test from the actual networks and geographies where they’re located, as well as the different devices and browsers that they use, means that they really don’t have a synthetic solution at all. Fantasy monitoring might be a more appropriate term.
The reason why these vendors have decided to take this approach is pretty simple – running a global infrastructure like Catchpoint’s that supports true synthetic monitoring from locations and networks all over the world is an expensive and challenging enterprise. Placing a few dozen nodes on cloud providers is far cheaper and easier.
To sell this solution, these vendors will tell you that the data collected from cloud nodes is “cleaner” or “more stable” – i.e. fewer performance spikes and alerts that you have to worry about – which is like saying that you never have to worry about buying an umbrella as long as you measure rainfall from inside your house. But in the meantime, you remain blind to all of the issues that lie outside of APM’s limited purview.
Here at Catchpoint, we’ve seen the problems that are caused by APM-based synthetics first-hand. For example, at one point global weather service AccuWeather switched away from Catchpoint in order to consolidate their synthetic monitoring under their APM vendor. However, they quickly realized that the limited geographic locations provided were woefully insufficient to service their global user base, necessitating a switch back to Catchpoint to provide a more complete monitoring strategy.
“Catchpoint has the infrastructure set up where we can protect our global brand with global monitoring,” says Stephen Savitski, Sr. Director of Enterprise Monitoring at AccuWeather . “The ability to get a test up and running in five seconds from the exact location where we’re seeing issues – as opposed to trying to configure a VM somewhere in the cloud – is huge for us.”
In addition to the geographic testing limitations, the APM-based synthetic tool that AccuWeather used before switching back to Catchpoint did not provide the analysis capabilities that are necessary for quick detection and repair times. “Most of our problems tend to be with client-side issues like third-party JavaScript, which could be video, which could be advertising, which could be content,” said Savitski. “Catchpoint’s synthetic monitoring helps us narrow our focus, analyze, and isolate specific waterfalls versus specific sessions and immediately see the problem. Then we can share that data with those partners, or wherever the problem may be, and rectify that quickly.”
This ability to isolate components of your digital architecture is also vital to enforcing the various SLAs that you have with your third-party vendors. Trying to do this by monitoring from cloud providers is a complete waste of time and resources because most of your vendors are also hosted on these same cloud providers; their vantage point is not that of the end user. The only way to truly hold them accountable is by monitoring from multiple vantage points outside of the cloud to shine a light on actual issues that your end users experience.
We saw this exact issue recently when Google Cloud Platform experienced a massive outage, not only disrupting their in-house services like GSuite and YouTube, but any third-party service that was hosted on their infrastructure as well. For example, if a tag management platform was hosted on GCP when it went down, the ripple effect could extend to any of the sites that utilized that tag management platform as well.
“There are revenue implications for detecting a file change of a third-party partner that we weren’t expecting…Catchpoint helps us react quickly and positively to those changes.”
AccuWeather experienced a similar problem when one of their ad serving networks made an unannounced change to their backend that resulted in ads not loading and/or loading slowly on AccuWeather’s digital properties. Without Catchpoint’s ability to isolate the performance of that ad code bundle, they would not have been able to pinpoint the exact file that was changed, and therefore would have been unable to prove to the provider that they were the cause of the problem.
“That really helped us identify where a problem happened that was actually tied to a revenue loss,” says Savitski. “It was literally only Catchpoint that caught it because we were hitting those tests, and I could look at the files and the third-party hosts and zones and look at the sizes of the files necessary to both initialize and complete the ad request. We could then easily see that an old file that we were expecting wasn’t being served anymore, and a new one took its place with different functionality.
“So that’s money, that’s revenue. There are revenue implications for detecting a file change of a third-party partner that we weren’t expecting. Changes can and will happen, but Catchpoint helps us react quickly and positively to those changes.”
Delivering great user experiences is no longer the relatively easy task that it once was. To face the challenges of increased complexity and higher end-user expectations for performance, companies must implement a monitoring strategy that reduces mean time to repair (MTTR), decrease false positives and false negatives, and identifies areas for performance optimization. To do this, you must deploy tools that complement each other to capture the necessary telemetry; having a single monitoring vendor simply isn’t enough. APM is a critical component of implementing a complete monitoring strategy, but it’s still just a small piece of the puzzle.
A transformation of your technology and culture requires a transformation of your monitoring strategy. To be a company that truly puts customer experience at the top of your priorities, a synthetic strategy that puts the end user at the forefront is an absolute must. And by doing this with a DEM vendor that offers RUM, network, and SaaS monitoring in addition to synthetics, you’ll be able to rest easy knowing that you won’t be woken up in the middle of the night for no reason.