Blog Post

DoubleClick Outage: Another Lesson in Third-Party Optimization

Published
March 14, 2018
#
 mins read
By 

in this blog post

Hundreds of websites and their user experience were impacted yesterday when Google’s DoubleClick suffered a major outage that lasted for hours. Catchpoint first reported the outage at 10:00 EDT on Tuesday, March 13th in Europe and at 15:00 EDT in US**.** As soon as the issue was identified, we had a team investigating and analyzing the impact and extent of the outage. You can read our initial analysis of the incident here.

The data Catchpoint aggregated from our synthetic monitoring confirmed there was a drastic drop in performance for all ad requests being served from two domains: doubleclick.net and google.adservices.com. Our customers were alerted within minutes, we suggested temporarily removing these requests to limit the impact of the outage on their website and prevent a negative user experience.

Outage Timeline

The day unfolded with Catchpoint picking up a sudden performance degradation on the website of some of our customers. We continued to monitor the major websites globally. Drilling into the data, the requests causing the delay were identified.

The Google Ads status page reported the issue at around 13:00 EDT. A bug in DoubleClick was identified as the cause of the performance degradation. The bug was fixed, and the issue was resolved at 19:30 EDT.

Catchpoint was able to “catch” the issue, approximately 3 hours before the Google Ads status page posted their first update on the issue.

Our customers that had specific zone-based alerts configured in their account received notifications of a third-party issue impacting user experience.

These alerts helped us detect the outage as soon as it happened.

We can see the disruption caused by the third-party requests on several websites in the table below. In the US, multiple retail sites using DoubleClick to serve ads slowed down by almost 400%:

The impact echoed across different verticals as seen from the data below:

Catchpoint also collected data using Real User Monitoring or RUM. There was a noticeable difference in the number of pageviews.

The chart above shows the impact on one of our customer’s website. We can see an approximate drop of 30% in page views when the performance dipped by more than 100%.

If we extrapolate the data to include the 500+ major websites that were affected by the performance degradation, then the impact on user experience would have been staggering and translated into a major loss of revenue.

Meanwhile in Europe, major websites saw similar drop in performance.

The Aftermath

This incident was a classic example of how third-party services can disrupt performance and bring down major websites. We can see the difference in performance before and after the issue began in the chart below.

The host doubleclick.net was experiencing high latency; TCP connections took longer to establish, and this generated HTTP 503 errors. The response time of other page requests were impacted, eventually pushing the onload event. This resulted in a higher document complete and webpage response time.

Lessons Learned

Catchpoint has always reiterated the importance of optimizing and monitoring third-party performance. We have published several blogs detailing incidents and how they contribute to negative user experience.

There are a few key points to remember when integrating third-party services on your website:

  • Always configure third-party tags to load later in the page load process, so they don’t impact document complete (that is, when the user is able to interact with the page).
  • Set scripts to load asynchronously, this will minimize bottlenecks caused by unresponsive scripts.
  • Avoid cluttering the page with multiple third-party scripts and implement them only where necessary.
  • Ensure third-party scripts are not outdated; third-party services tend to update their code versions.
  • If you are using different third-party services on the same page, check for conflicting scripts in the code during implementation to avoid bugs and errors during code execution.
  • Always monitor third-party services proactively to ensure there is no performance degradation.
  • Finally, use a tag manager to bring all the third-party services under one window. This makes it easier to manage scripts. It also allows you to configure when you want the scripts to load (as illustrated below):

The outage was significant and impacted user experience across a large number of sites, so it’s a given that Google would have had all hands on deck as soon as the alarm bells rang. Kudos to the team at DoubleClick that handled the incident, from the status updates to issue resolution; they managed to fix the bug within a few hours and prevent a potential catastrophe for ad publishers.

Hundreds of websites and their user experience were impacted yesterday when Google’s DoubleClick suffered a major outage that lasted for hours. Catchpoint first reported the outage at 10:00 EDT on Tuesday, March 13th in Europe and at 15:00 EDT in US**.** As soon as the issue was identified, we had a team investigating and analyzing the impact and extent of the outage. You can read our initial analysis of the incident here.

The data Catchpoint aggregated from our synthetic monitoring confirmed there was a drastic drop in performance for all ad requests being served from two domains: doubleclick.net and google.adservices.com. Our customers were alerted within minutes, we suggested temporarily removing these requests to limit the impact of the outage on their website and prevent a negative user experience.

Outage Timeline

The day unfolded with Catchpoint picking up a sudden performance degradation on the website of some of our customers. We continued to monitor the major websites globally. Drilling into the data, the requests causing the delay were identified.

The Google Ads status page reported the issue at around 13:00 EDT. A bug in DoubleClick was identified as the cause of the performance degradation. The bug was fixed, and the issue was resolved at 19:30 EDT.

Catchpoint was able to “catch” the issue, approximately 3 hours before the Google Ads status page posted their first update on the issue.

Our customers that had specific zone-based alerts configured in their account received notifications of a third-party issue impacting user experience.

These alerts helped us detect the outage as soon as it happened.

We can see the disruption caused by the third-party requests on several websites in the table below. In the US, multiple retail sites using DoubleClick to serve ads slowed down by almost 400%:

The impact echoed across different verticals as seen from the data below:

Catchpoint also collected data using Real User Monitoring or RUM. There was a noticeable difference in the number of pageviews.

The chart above shows the impact on one of our customer’s website. We can see an approximate drop of 30% in page views when the performance dipped by more than 100%.

If we extrapolate the data to include the 500+ major websites that were affected by the performance degradation, then the impact on user experience would have been staggering and translated into a major loss of revenue.

Meanwhile in Europe, major websites saw similar drop in performance.

The Aftermath

This incident was a classic example of how third-party services can disrupt performance and bring down major websites. We can see the difference in performance before and after the issue began in the chart below.

The host doubleclick.net was experiencing high latency; TCP connections took longer to establish, and this generated HTTP 503 errors. The response time of other page requests were impacted, eventually pushing the onload event. This resulted in a higher document complete and webpage response time.

Lessons Learned

Catchpoint has always reiterated the importance of optimizing and monitoring third-party performance. We have published several blogs detailing incidents and how they contribute to negative user experience.

There are a few key points to remember when integrating third-party services on your website:

  • Always configure third-party tags to load later in the page load process, so they don’t impact document complete (that is, when the user is able to interact with the page).
  • Set scripts to load asynchronously, this will minimize bottlenecks caused by unresponsive scripts.
  • Avoid cluttering the page with multiple third-party scripts and implement them only where necessary.
  • Ensure third-party scripts are not outdated; third-party services tend to update their code versions.
  • If you are using different third-party services on the same page, check for conflicting scripts in the code during implementation to avoid bugs and errors during code execution.
  • Always monitor third-party services proactively to ensure there is no performance degradation.
  • Finally, use a tag manager to bring all the third-party services under one window. This makes it easier to manage scripts. It also allows you to configure when you want the scripts to load (as illustrated below):

The outage was significant and impacted user experience across a large number of sites, so it’s a given that Google would have had all hands on deck as soon as the alarm bells rang. Kudos to the team at DoubleClick that handled the incident, from the status updates to issue resolution; they managed to fix the bug within a few hours and prevent a potential catastrophe for ad publishers.

This is some text inside of a div block.

You might also like

Blog post

Preparing for the unexpected: Lessons from the AJIO and Jio Outage

Blog post

Prioritize Internet Performance Monitoring, urges EMA

Blog post

The cost of inaction: A CIO’s primer on why investing in Internet Performance Monitoring can’t wait