Filtering Mail the Old-Fashioned Way
Before Getting Things Done and Inbox Zero, before graphical email programs were common — before web browsers were common, if you can imagine that — the average internet user actually had more control over their email environment than most do now. That level of control is only now beginning to reappear, with smart automated filters and so forth — but we would do well to learn from history.
The most common mail filtering program on unix-like systems was (and still is) procmail, which has been around since at least 1990. A procmail rules file looks like line noise (something else today’s users are largely unaware of), and there’s a steep learning curve, but it was (and still is) very powerful. Rather than a full tutorial, though, I’ll just list some of the features that are still missing from most email software today.
First, and perhaps most importantly, procmail operates at delivery time — either before the SMTP conversation is complete, or between the mail receiving process and storing the message in the user’s inbox. Contrast this with desktop mail software that can’t apply filters until it downloads the message, or some webmail clients where filters don’t run until the user logs in.
The basic shape of any procmail filter recipe is the same. First it sets the context: headers, body, both, or some conditional logic whereby a rule may execute based on the results of a previous rule. Then it looks for a string or regular expression match in the headers or body of the message, or feeds the message through an external process to derive the results. It may do this multiple times in a single rule. Finally, the message is either saved to a specified file or directory, or fed into an external program.
What first got me into using procmail in the early nineties was that I’d subscribed to a lot of discussion lists, and they were all overlapping in my inbox. To clean that up, I created procmail rules which looked for headers unique to each list, and placed the mail from each list into a separate folder. Along the way, of course, I learned all about headers and message structure. Today I’ve got 25 rules for discussion lists on my main personal email account, and 37 on the account I use for IETF participation and similar work.
Filtering discussion lists is much eased by the List-Id header, defined in RFC 2919. When that’s not available we can use List-Unsubscribe or other headers from RFC 2369. Compare this to “modern” mail clients where you can only filter on the commonly visible headers: From, To, Subject, and maybe something custom — but don’t expect any help from the software or the developers in figuring out what those should be.
At a sysadmin job some years ago, I configured procmail to send the Subject line of system alerts to my pager — but only when the severity of the alert was high. This rule called an external program that operated a dial-out modem to send the text message.
Peter Blair has a procmail recipe which sorts complaint feedback loop messages by source — and pipes some through perl first to convert them to ARF. (It reminds me of the recipes I used to use as a usenet newsgroup moderator — but that’s a whole other article.)
Another great use for procmail is to create more selective “out of office” messages: one for co-workers, one for friends, one (or none) for strangers, with a cache to ensure that it’s only sent once to each correspondent — and never to discussion lists, or in response to bulk or junk messages. Even after all these years, Microsoft Exchange still gets that wrong.
There are also some easy ways to configure procmail to get rid of messages you don’t want. The easiest is to send them to /dev/null, a virtual device on unix-like systems where data goes in and it never comes out. Depending on how the system is configured, procmail may be able to direct the mail transport software to reject the message — either in-stream, or by generating a non-delivery notification (bounce message) after the fact. You can also easily forward messages to another address — perhaps to abuse@ the sender’s ISP, a popular tactic in the early days of spam-fighting.
Choosing which messages will be subjected to these negative measures pretty easy, too, but somewhat of a dull instrument. The easiest method is the same as I described for discussion lists: headers can identify the source of a message. Sometimes you’ll want to get fancier, and search through the body of the message for common spam phrases like “you are receiving this message because you signed up at one of our partners’ sites.” That also has the risk of false positives, obviously, so be careful.
With a lot of work, procmail can become a fairly effective personal spam-filtering engine. But as spammers have gotten more sophisticated, it has become harder to accomplish everything inside of procmail. Commonly, systems will run SpamAssassin first and then let procmail to interpret SpamAssassin’s results: if a message has “X-Spam-Flag: YES”, quarantine it in a spam folder or /dev/null. Even before SpamAssassin sees the message, it can be verified with OpenDKIM — which can also apply some filtering rules. And even before that, the mail server will compare the connecting IP address against a few DNS blocklists, such as the RNBL.
Still, as I described in the first few examples, procmail’s real power and purpose is not in catching spam, but rather in sorting mail for positive purposes. Rather than one inbox, you can have multiple — and if you don’t use a text-based mail client like Alpine or Mutt, you can access all of those folders in your favorite graphical email client (Thunderbird, Apple Mail, even Outlook) or on the better mobile devices via IMAP.
What continues to surprise me, though, is that while procmail and similar programs have been around for decades, today’s most popular mail software is only just now catching up. I think it’s because of the old quandary of user-focused software design: if users don’t realize that something is possible, they won’t ask for it. If they don’t ask for it, it doesn’t get developed…and eventually, everyone forgets. So now we see people blocking off time in their calendar to achieve “inbox zero” by manually performing filtering functions that procmail could do automatically. What a waste!