Preparing for the unexpected: Lessons from the AJIO and Jio Outage
Just in the past few months, we've seen high-profile outages happen for all sorts of reasons—server glitches, network issues, and even that now-infamous configuration update that nearly broke the Internet. These incidents remind us how fragile our digital world can be. But it's not just software-related issues causing these disruptions. In the last week alone, two unfortunate incidents involving fires have taken down major websites in the Asia-Pacific region.
On September 10, 2024, a fire at a data center in Singapore caused a significant outage for Alibaba Cloud, affecting services for major companies like TikTok, ByteDance, and Lazada. This time, it's Reliance-owned AJIO and Jio websites that have been hit, leaving millions of users unable to shop, pay bills, or access services. Let’s get into what happened.
What happened?
On September 17, 2024, Reliance Jio encountered a major network outage affecting customers across multiple regions in India and across the globe. The outage was initially noticed when users began encountering connection timeouts attempting to access both the AJIO and Jio websites. The outage was resolved around 05:42 EDT.
Traceroutes conducted from various locations indicated multiple hop failures, suggesting possible network issues along the route.
Additionally, it wasn’t just these two sites that were affected. The Reliance Digital website also went down, displaying an error message processed through Akamai Edgesuite:
“An error occurred while processing your request.”
Broader impact
This incident did not affect only Jio's network. Our data reveals this outage also had a wider impact on other ISPs.
This scatter plot above shows the widespread impact of the outage across multiple ISPs, indicating that the issue extended beyond Reliance Jio, affecting others like Airtel, Vodafone, and BSNL. The spikes in Ping Round Trip Time (RTT), indicate network delays. This suggests that the outage caused a ripple effect, leading to connectivity issues and delays across various networks.
This Sankey diagram shows the impact across multiple ISPs while reaching an endpoint. It highlights how the outage disrupted network flow between regions.
Root cause
Reuters reported that a fire at Reliance’s data center caused the nationwide outage. A Reliance Jio spokesperson confirmed the outage and claimed the issue had been fully resolved.
Impact and key learnings
With Jio leading India's telecom space with nearly 489 million subscribers, the impact of this outage was massive. Millions of users could not shop on AJIO, pay bills, or even access essential services. Frustration quickly spread across social media—imagine the outrage on X, the wave of memes, and the flood of negative comments from users demanding answers and solutions.
This situation serves as a stark reminder for businesses that large-scale disruptions can happen anytime, even due to unexpected events like a fire. Preparing for such unforeseen incidents requires more than just reactive solutions. To mitigate the impact of similar disruptions in the future, companies need to focus on two key areas:
- Gain full visibility across networks: The outage's impact on other ISPs demonstrates the importance of having visibility into the entire Internet Stack, which includes all network components, not just internal systems. Understanding the performance and health of external dependencies—such as CDN, DNS, and ISP networks—is crucial. This broader visibility helps companies quickly identify where issues are occurring and whether they originate within their own network or from external partners.
- Proactive monitoring is key: This incident highlights the need for proactive monitoring across the Internet Stack. Early detection of issues like packet loss, latency spikes, and network congestion can allow companies to address potential problems before they escalate into full-blown outages.
Gain full visibility into your Internet Stack with Catchpoint
We’ve built our Internet Performance Monitoring (IPM) platform from the ground up to deliver deep and wide visibility into the Internet Stack, enabling you to find and fix disruptions before your business is affected. Our cloud-native platform ensures Internet Resilience across your organization with the following industry-leading features and capabilities:
- Unparalleled worldwide and regional visibility through our Global Observability Network with over 2,700 nodes from more than 360 providers in 101 countries – with more being added all the time.
- Proactive incident management so you can identify and resolve issues, proactively, across public and private networks and application layers, to enable IT teams to identify root cause and triage, fast.
- AI-powered tools, including:
- Internet Sonar so you can answer the question, “Is it me or something else?” quickly.
- Internet Stack Map for instant awareness of critical service or application issues.
Learn more about preventing outages from our guide, or test drive Catchpoint for yourself in our guided product tour.