Understanding DNS cache
The first time I learned how DNS gets resolved, I was quite surprised by how long and complicated the process was. Think about how many websites you visit in a given day, then consider how many of those you go to multiple times. Now imagine that every time you did, the ISP’s DNS server at the other end had to repeat the entire recursion process from scratch and query all the name servers in the recursion chain.
To put that into context, think about your cell phone. When you want to make a call to a friend with whom you speak regularly, you simply go to recent calls and tap on their name. But what if, instead of having that information readily available, you had to call 411 to get their phone number, then type it in manually. Seems pretty tedious, right?
The fact is that there are a lot of steps — and therefore a lot of time — required to change a domain name into an IP address. Fortunately, the DNS designers had thought about how to speed up DNS and implemented caching. DNS caching allows any DNS server or client to locally store the DNS records and re-use them in the future – eliminating the need for new DNS queries.
The Domain Name System implements a time-to-live (TTL) on every DNS record. TTL specifies the number of seconds the record can be cached by a DNS client or server. When the record is stored in cache, whatever TTL value came with it gets stored as well. The server continues to update the TTL of the record stored in the cache, counting down every second. When it hits zero, the record is deleted or purged from the cache. At that point, if a query for that record is received, the DNS server has to start the resolution process.
To understand caching, let’s look at the same example as previous articles, resolving www.google.com. When you type www.google.com on the browser, the browser asks the Operating System for the IP address. The OS has what is known as a “stub resolver” or “DNS client,” a simple resolver handling all the DNS lookups for the OS. The resolver will send DNS queries (with recursive flag on) to a specified recursive resolver (name server) and stores the records in its cache based on their TTL.
When the “Stub Resolver” gets the request from the application it first looks in its cache, if it has the record it gives the information to the application. If it does not have it, it sends a DNS query (with recursive flag) to the recursive resolver (DNS server of your ISP).
When the “Recursive Resolver” gets the query it first looks in its cache to see what information it has for www.google.com. If it has the A records, it sends the records to the “Stub Resolver.” If it does not have the A records, but has the NS records for the authoritative name servers, it will than query those name servers (bypassing the root and .com gTLD servers).
If it does not have the authoritative names servers, it will query the .com gTLD servers (which most likely are in cache since their TTL is very high, and they are used for any .com domain). The “Recursive Resolver” will query the root servers for the gTLDs only if they are not in the cache, which is quite rare (usually after a full purge).
To prevent the propagation of expired DNS records, the DNS servers will pass the adjusted TTL to a query and not the original TTL value of the record. For example, let’s assume that the TTL for an A record of www.google.com is four hours and it is stored in cache by the “Recursive Resolver” at 8 a.m. When a new user, on the same resolver, queries for the same domain at 9 a.m., the resolver will send an A record with a TTL of three hours.
So far we have covered how DNS caches on the OS and DNS Servers, however there is one last layer of cache: the application. Any application can choose to cache the DNS data, however they cannot follow the DNS specification. Applications rely on a OS function called “getaddrinfo()” to resolve a domain (all OS have the same function name). This function returns back the list of IP addresses for the domain – but it does not return DNS records, hence there is no TTL information that the application can use.
As a result, different applications cache the data for a specific period of time. IE10+ will store up to 256 domains in its cache for a fixed time of 30 minutes. While 256 domains might seem like a lot, it is not – a lot of pages in the internet have more than 50 domains referenced thanks to third party tags and retargeting. Chrome, on the other hand, will cache the DNS information for one minute, and stores up to 1,000 records. You can view and clear the DNS cache of Chrome by visiting chrome://net-internals/#dns.
NS Cache Trap
One major trap that people fall into with DNS cache is the authoritative name server records. As we mentioned before, the authoritative name servers are specified in the query response as NS records. NS records have a TTL, but do not provide the IP addresses of the name servers. The IP information is in the additional records of the response and are A or AAAA records.
Therefore, a Recursive Resolver relies on both NS and A records to reach the name server. Ideally, the TTL on both types of records should be the same, but every once in a while someone will misconfigure their DNS zones and they pass in DNS query responses for the domain new A or AAAA records for the name servers with lower or higher TTL than what was specified in the TLDs. These new records override the old records, causing a discrepancy.
When both records are in cache, the “Recursive Resolver” will query one of the IPs of the name servers. If the “Recursive Resolver” has only the NS records in the cache, no A or AAAA records, it will have to resolve the name sever domain, so ns1.google.com, to get the IP address for it. This is not good, as it adds time to resolving the domain in question. And if it has the A or AAAA records of the name server but not the NS records, it will be forced to do a DNS lookup for www.google.com.
Setting TTL: A Balancing Act
So what’s better, a longer or shorter TTL? When appropriate, use a longer TTL, as it leads to longer caching on resolvers and Oss – meaning better performance for end users – and it also reduces traffic to your name servers as they will be queried less often. However, it also reduces your ability to make DNS changes, leaving you more vulnerable to DNS poisoning attacks and unable to set up offsite error pages when your datacenter is not accessible.
On the other hand, a shorter TTL limits caching and will add time to downloading the page and/or resources, while raising the stress on your name servers. Yet they let you make quicker changes to your DNS configuration.
————————
DNS resolution is a multi-step process that involves a lot of servers over the internet. Caching mechanism built in the protocol speeds up the process by storing information for periods of time and re-using it for future DNS queries. While DNS servers and/or clients do follow the DNS specs on TTL, application like browser do not follow the spec – hence their cache is stored for an arbitrary amount of time.
Learn how you can monitor DNS with our observability solution.