AOL’s MX Record Disappeared Temporarily Overnight
by J.D. Falk
Director, Internet Standards and Governance
Well, here’s an odd one for you.
Annalivia Ford reports on her blog that AOL’s MX record vanished for three hours this morning. (Anna used to work in AOL’s postmaster team, so she notices things like that. So do some of the denizens of the postfix-users mailing list.)
MX records are a surprisingly useful vestige of a simpler time. Until the mid-1980s, it could generally be assumed that if you wanted to deliver mail to email@example.com, you’d open a connection directly to the server named kremvax.example.com and transfer the message. But this only scales so far: what happens if kremvax.example.com goes offline for a while? Or if example.com wants consistent usernames and mailboxes across all of their servers? Or if example.com wants someone else to host their mail?
Then MX records, which are a specific type of pointer in DNS, were introduced. These provided a lot of flexibility: a domain name could have different MX records with different priorities, so that mail servers would attempt to connect to the higher priority (lower number) server first, and lower priority (higher number) servers later as backup. Or, a large site like AOL could have a whole bunch of records at the same priority, permitting parallel load-balancing.
And that’s exactly what they did: mailin-01.mx.aol.com through mailin-04.mx.aol.com all have the same priority, and for further load balancing each of those four hostnames points to five geographically disparate IP addresses.
MX records could also point to servers in entirely different domains. We do this with senderscore.org, whose MX record points to mx.returnpath.net — meaning one server can handle mail routing for multiple domains. These days, this scenario is so common it’s difficult to imagine there was ever a time when things didn’t work that way.
So what happens when the MX record disappears?
Well, to deal with network hiccups and similar issues, any DNS server which had looked up aol.com previously would cache it for about twelve hours. (The cache refresh time is something AOL specified in their DNS record; the default is only one hour.)
If the MX record isn’t in the local cache, some (but not all) mail server software will fall back to the A record — behavior left over from the early days of the network. However, these days A records tend to point to web servers, not mail servers — and that’s certainly true in AOL’s case. None of the three IP addresses in their A record accept SMTP connections.
And if a site looked up aol.com’s MX record during the three hours or so where it didn’t exist, their nameserver will cache the lack of an MX record — for twelve hours! If that’s happened to you, the most effective way to fix it is probably to restart your local DNS servers. You may also have to flush other caches; if so, it may be easier to just reboot.
How do you know if this happened to you? Check your mail server logs. If the server couldn’t connect to aol.com, it might be because of this. But that’s not a particularly big deal, because most mail server software will queue the message until a connection can be made. Only thing to worry about is whether your software will do a new MX lookup before retrying (which is what it should do), or keep pounding away at the A record (bad idea.)
If the logs show that it did connect, and got a response — even if that response is a rejection (5xx) or a deferral (4xx) — then it’s not because of the MX record. The MX record must exist in order to connect to their SMTP server, and get a reply from it.
We haven’t heard, yet, what caused AOL’s MX record to disappear. Chances are, they’ll keep it kinda quiet. It was probably a simple typo — and while I’m sure they have end-to-end monitoring of their mail systems, it may not have been caught until the DNS cache expired.
Have you checked your MX record recently? Would your monitoring notice if it disappeared? Hmm!