Blog Post

DNS Experience Tests: A Key Cog in the Online Ecosystem

Published

August 3, 2016

mins read

Mehdi Daoudi

in this blog post

A company’s collection of online systems is like a delicate ecosystem – all components must integrate with and complement each other, and one single malfunction in any of them can bring the entire system to a screeching halt.

That’s why, when monitoring and analyzing the health of your online systems, you need a broad arsenal of different tools for your different needs. In addition to a wide-angle lens that provides a snapshot of the overall health of your system, you must also have precise, scalpel-like tools that can isolate and analyze all of those different components (DNS, CDNs, internal and external servers, third-party tags, etc.).

Catchpoint was designed to be that exact kind of precision tool. When evaluating the health of your systems, a simple availability test is useful, but it only a basic binary state: you’re either up or you’re down. Figuring out why you are down, or why your customers are waiting longer than they should for a page to load, is a wholly different matter, and requires precise diagnostic and analytical capabilities in order to provide you with actionable data.

Of these many different parts that make up an online system, DNS is perhaps the most important. It’s the very first interaction that a customer has with an online brand, and therefore having a rapid DNS lookup and resolution process is vital to maintaining an exceptional customer experience. Yet to properly assess your DNS health, you need that aforementioned scalpel, not a broad sword. This is why Catchpoint has maintained close relationships with DNS solution providers such as NS1, keeping open channels of communication in order to create the most precise tools possible.

This type of relationship has manifested itself positively for NS1, which uses Catchpoint tests to ensure that they are providing their customers with the best possible DNS resolution times, but it’s equally important for clients of those providers to monitor their own DNS performance to detect any issues that the vendor might be missing.

One way to drill down and gain additional insight into a DNS performance issue is through the different types of DNS monitors that Catchpoint offers. To play out a scenario, let’s say that you catch a DNS resolution problem in a basic browser test:

While this data shows that your users are suffering from bad experience due to DNS latency, you have no way of knowing where the latency occurred in the DNS resolution process. To gain this information, you need a DNS monitoring solution that shows performance and error data for all the different steps and servers in the DNS chain. Additionally, you can keep an eye on specific types of records (Answers, Authoritative Name Servers, or Additional Records) from the DNS query, which allows you to detect issues such as wrong TTLs, DNS Cache Poisoning, misconfigurations, etc.

This is imperative when it comes to detecting a third party DNS vendor’s errors, because most organizations rely on external DNS registrars and vendors, but have little visibility in their performance and availability.

Getting back to our hypothetical problem, once we run our DNS Experience tests, we get a result that looks like this:

In this test, which hits a multitude of different name servers in succession, we see intermittent spikes in performance (blue line chart) and drops in availability (green line chart), which tells us that the problem is isolated to specific name servers as opposed to the whole lot of them. Therefore, we need to run DNS Direct tests to isolate each of those name servers:

Now the exact source of the problem becomes clear. There’s one specific name server which has failed multiple times in the test timeframe, which means that we now have actionable data to work with. The DNS provider, if it hasn’t already located the source of the problem through the same process, can be made aware of the issue so that they can take that server offline while they fix the problem.

In addition to the advanced insight that specific monitors provide, one of Catchpoint’s strongest attributes is our global node coverage. As a global DNS provider, NS1 knows that the ability to test their servers from as many different locations as possible is imperative to understanding the full scope of end users’ DNS experience. NS1 uses Catchpoint nodes around the world to collect data, and then using Catchpoint’s Push/Pull APIs, they can input that data into any number of different tools in order to make it actionable.

Just like the ecosystem of different components that make up modern online systems, there is an ecosystem of tools of similar size and scope to keep an eye on all of those components. Catchpoint is a cog in that ecosystem; this is why it was designed to play nicely with other alerting, communication, and monitoring tools that IT Ops professionals regularly use. There are Catchpoint integrations in place with Slack, VictorOps, PagerDuty, Zapier, etc., and the APIs work with many other different tools, including (as NS1 themselves wrote about) OpenTDSB, an open source time series database.

The importance of getting precise, actionable data cannot be overstated when looking at the overall importance of digital performance analytics. By getting the most out of all the tools available to you and making sure that they complement and work well with each other can make all the difference in the health of your online systems.