Validating Addresses in an Unbounded Namespace

Posted by J.D. Falk on

by J.D. Falk
Director of Product Strategy

Earlier this week, we wrote about the expansion of top-level domain names, and the decreasing importance of domain names to users looking for web content.

Though they don’t always realize it, accuracy in domain names is important to end users when it comes to their email addresses — and it’s equally important to anyone who collects email addresses, for any purpose.

Mistyped email addresses can have far-reaching consequences. One of the canonical examples is the story of Nadine, where someone input the wrong address when signing up at a sweepstakes site in 2001 — and the owner of the domain name still receives more than 70 spam messages addressed to her every day.

Closer to home, there’s somebody out there named James Falk (no relation to me) who occasionally gives sites my yahoo.com email address, which I’ve had since 1996 or ’97. One of those legal document email sites (a competitor to our partner RPost) even sent me what appeared to be the lease to his new house! There was no confirmation (or “double opt in”) step, no “this is not me” link, no way to unsubscribe. Often, these sites will happily send all sorts of personal information — except, unfortunately, his actual email address so I can inform him of his mistake.

In both of these cases, a user typed in the wrong address at a valid domain. There’s no way to gather statistics, but I’m sure it’s far more common that typos point to entirely invalid domains: yahoo.cmo, or returnpath.nett. These can be caught in software, but it’s still not as easy as it looks.

Consider a regular expression such as:

^.*@.*.(com|edu|gov|int|mil|net|org)$

That would match email addresses at the original six generic top-level domains, or gTLDs. It wouldn’t match two-letter country code TLDs (ccTLDs), but there are hundreds of those, so let’s include them more simply:

^.*@.*.((com|edu|gov|int|mil|net|org)|..)$

For those who can’t read regular expressions, this means: from the start of the line, match any number of any characters, then an @ symbol, then any number of any characters, then a . symbol, then either: one of com, edu, gov, int, mil, net, or org, or two characters — after which the line ends.

But now there are more gTLDs. If they were all three characters long, it’d be easy — but they’re not, so we’re left with:

^.*@.*.((aero|asia|biz|cat|com|coop|edu|gov|info|int|jobs|mil|mobi|museum|name|net|org|pro|tel|travel)|..)$

And with ICANN poised to add more soon, that list will keep getting longer — requiring constant maintenance just for this deceptively simple and woefully incomplete email address checking algorithm.

Why incomplete? For one thing, it only tells you that the domain might exist, not that it does. It also lets through all sorts of characters which aren’t valid, and has nothing to prevent SQL injection or similar attacks. Seriously, it’s just an example, don’t use it.

Last year Steve Atkins wrote about the legal components of an email address (which are far more limited than my simple regular expressions here), and gave a list of things to check to make sure an address is valid. His list is a lot longer and more accurate than my example above, and should be proof against the now-unbounded TLD namespace.

I still see web sites from time to time which do their address verification poorly, usually disallowing things that should be allowed and allowing things that could never be valid. Using standard PHP, Perl, or JavaScript address checking libraries helps, but you have to keep them up to date — just like if you wrote it yourself.

And, remember: just because a domain is valid, and the email address is correctly formed, doesn’t mean that there isn’t a spam trap on the other end — or that the recipient, if there is one, wants to receive your email. Luckily, there’s an easy way to ask — and to make sure the address is valid at the same time.

It’s just a shame that invalid addresses can’t be caught as easily in software anymore.


Popular this Month

 Video in Email: Is It Right For Your Business? (Part 1)

Video in Email: Is It Right For Your Business? (Part 1)

Video in email is nothing new. Marketers have been using some form of video...

Read More

 [New Research] Are These Hidden Metrics Harming Your Deliverability?

[New Research] Are These Hidden Metrics Harming Your Deliverability?

Reaching the inbox is not as simple as hitting send. Once a message is...

Read More

 What Job Is Your Subscriber Hiring Your Email To Do?

What Job Is Your Subscriber Hiring Your Email To Do?

Over the last 16 years, I’ve worked as a product manager, run product...

Read More

Author Image

About J.D. Falk

Author Archive

Stay up to date

Enter your name and email address below to subscribe to our mailing list.

Your browser is out of date.
For a better Return Path experience, click a link below to get the latest version.