Adobe Experience Cloud Outage: The Impact of Relying on Third-party Services
On December 8, 2023, Adobe's extensive customer base was impacted by a series of outages in the Adobe Experience Cloud, starting from 8:00 AM EST and continuing until 1:45 AM EST on December 9.
We haven't seen a third-party outage of this magnitude since the DoubleClick outage of 2018.
According to Adobe, Data Collection (Segment Publishing), Data Processing (Cross-Device Analytics, Analytics Data Processing), and Reporting Applications (Analysis Workspace, Legacy Report Builder, Data Connectors, Data Feeds, Data Warehouse, Web Services API) were all affected by the outage.
Adobe Analytics reports that within the Experience Cloud, multiple services were down for several hours. The outages in various services started and ended at different times, with varying outage durations. Note that these times do not reflect when Adobe updated its status page or informed its customers about the outage.
The cost of such a service being down for 18 hours adds up quickly – for Adobe and its customers. Both are impacted by lost revenue due to service disruptions and damaged brand reputation. On top of that, Adobe risks incurring SLA violations for millions of customers.
Catchpoint's Internet Sonar was the first and only tool to detect this outage, significantly outperforming others like Thousand Eyes and Downdetector. This incident not only validates our claims about Internet Sonar but also underscores the importance of Catchpoint’s Internet Performance Monitoring (IPM) Platform to navigate the growing complexity and fragility of the Internet.
Now that we understand the 'what' and the 'how much,' let's dive into the incident review.
How we detected the outage
Catchpoint’s IPM Platform spotted the outage in at least three different ways:
- Internet Sonar (service outage detection and correlation)
- Catchpoint Synthetic tests run by customers who rely upon Adobe
- Catchpoint Professional Services running analysis on behalf of major retail and e-commerce customers
Here are our observations for each of the three areas above.
Internet Sonar (service outage detection and correlation)
Internet Sonar monitors Adobe Tag Manager, a service in the Adobe Experience Cloud. On Friday, December 08, at 8.03 AM EST, Internet Sonar detected timeout errors from a large number of cities globally. Internet Sonar alerts notified customers at 8:20 AM EST once the failures were confirmed as significant incidents, not just short-term outliers.
Internet Sonar quickly detected outages to Adobe Tag Manager as shown above. Sites with this tag were painfully slow, with some taking 100-200 seconds to load.
Internet Sonar also performed intelligent correlation for failing Synthetic tests run by customers. The screenshot below shows the records page of a Catchpoint Synthetic test for a customer's service using Adobe Tag Manager, where Internet Sonar correlated the test failure with the Adobe Tag Manager outage.
Internet Sonar enables users to answer the question, "Is my service experiencing issues due to a problem in my application or infrastructure, or is it one of the 3rd party services in the Internet Stack I rely upon to deliver my service?"
Pro-active Monitoring of the Adobe Experience Platform
Many of Catchpoint's e-commerce customers who rely on Adobe also began experiencing multiple failures in the synthetic tests they run on the Catchpoint platform.
Test failure #1: HTTP 404 Not Found
Root cause for the failures for Journey Optimizer: Request
https://auth.services.adobe.com/signin
returning an HTTP 404, resulting in login not going through.
An e-commerce customer dependent on Adobe extensively monitors the Adobe Experience platform. Their synthetic test for "Adobe Journey Optimizer," part of Experience Cloud, showed significant impact:
HTTP Response: {"errorCode":"invalid_resource_id","errorMessage":"Could not find resource for id v:2,s,f,bg:eclogin,..."}
Test failure #2: TCP Timeout Errors
TCP Timeout errors for Launch.js JavaScript Request; URL initiated by DTM.js
HTTP Request: https://assets.adobedtm.com/a7d65461e54e/6e9802a06173/launch-43baf8381f4b.min.js
Incident Start time: Dec 08, 2023 - 05:04:37 PT
Status: Ongoing
Regions Impacted: Global
Catchpoint Professional Services running analysis on behalf of our customers
Catchpoint Professional Services, which monitors and analyzes websites for major retail and e-commerce customers, noticed several failures attributed to the Adobe outage.
Test failure #1: Test timeout caused by high connect time
We observed failures across multiple tests due to test timeout impacted by high connect time for assests.adobedtm.com
We also observed no response from servers:
Test failure #2: Increase in wait time
We noticed the host chart showing an increase in wait time for requests from "assets.adobedtm.com"
Waterfall data also showed 503 – Service Unavailable error for a specific request from Adobe:
Test failure #3: Test failure and performance degradation
We also noticed test failures and performance degradation due to Adobe request failures.
We also used WebPageTest (WPT) results. Note that only after the timeout of Adobe assets, the content on the page is displayed to users.
Real user monitoring (RUM) data revealed the impact on end users:
Why you need Internet Sonar
Imagine waiting to find out your service or site is down through negative posts on social media. Now, you don't have to.
When it comes to outages like these, it is extremely important to have a tool that helps you answer the question, "Is it me or something else?" A tool capable of pinpointing the source of Internet disruptions at a glance, meaning no finger-point, no war rooms, just intelligent, trustworthy Internet health information to accelerate incident detection. That's the core concept behind Internet Sonar.
What sets Internet Sonar apart:
- Unparalleled worldwide and regional visibility leveraging Catchpoint's Global Observability Network with over 2600 nodes from more than 300 providers in 94 countries – with more being added all the time.
- Hundreds of the most popular Internet services monitored, including Internet Infrastructure (CDN, DNS, Cloud), SaaS (email, SaaS, UCaaS, SECaS), and MarTech (Ad serving, Analytics, Video).
- Real-time email alerts as well as webhook or API access for easy integration into any application.
- Automatic, AI-powered data correlation with active monitoring for simple, real-time status information.
As Steve McGhee, Reliability Advocate, SRE, Google Cloud, highlighted in his Conclusion for Catchpoint's 2023 SRE Report, there is a reason why experts never depend on a single solution, tool, or platform to accomplish their tasks in the best possible manner. "When it comes to skilled labor, or 'operations' perhaps," writes Steve, "you want teams to be able to reach for the right tool at the right time, not to be impeded by earlier decisions about what they think they might need in the future."
Watch this on-demand product demo video to learn more about Internet Sonar and look out for an upcoming blog post discussing best practices for monitoring the Adobe Product suite with Catchpoint.