Spam Links

March 27, 2005

IBM shift costs of spam to forged senders

IBM have joined the growing body of users and promoters of challenge/response. IBM says

FairUCE eliminates any need for a "probable spam" folder, as well as the necessity of keeping up with the latest version of antispam software
but it can only achieve this by shifting the costs of dealing with hard to classify spam onto forged senders with real mailboxes.

There are some good concepts in their offering, FairUCE, but the authors state that their challenge/response part

alone catches 80% of UCE and very rarely challenges legitimate mail
and that
FairUCE sends a challenge only when the mail appears to be spoofed
which is a tacit admission that the system sends challenges to large numbers of forged senders. This can only contribute to the torrent of backscatter inflicted on receiving sites.

Choosing not to enter the filtering arms-race is an interesting idea, but using challenge/response to use forged senders as your filtering engine is not fair use of their resources.

Posted by spamlinks at 12:00 PM | Comments (0)

March 02, 2005

Real Bayesian filtering?

Take a look at the spam filtering research page on Spam Links. The impression we're getting is that you have two sets of people doing the research - one set are the sysadmins, the people at the front line; the other is the researchers who are trying out the latest information theoretic concepts to push that 99.999% rate with few false positives.

Sysadmins tend to work with tried-and-tested rules that discard mail if they trigger: heuristics. Take a look at Spam Filtering for Mail Exchangers, for example, which is an excellent summary of ways to detect and terminate spam sessions coming in to a mail server. These are practical and effective, and based on the mechanics of receiving email. Rule-based scoring, while it can be very effective, can be vulnerable to spammers adjusting their mail using the known defaults, and can fall behind new spammer tactics.

The researchers are much more focussed on email as content. Text Classification Spam Filtering is mostly about classifying the text of the email as spam or ham - and it can do very well.

Why not do true Bayesian filtering instead? Choose a set of the well-selected heuristics that the sysadmins rely on (in SPEWS, sent at 4am, not SPF authorised, DCC seen it) and apply Bayesian statistics to those features. You get to use all of the prior knowledge that is available, taking advantage of the hard work of groups like SPEWS and Spamhaus, but you get to temper that with other features, so making false positives much less likely.

If this is being done somewhere we'd love to have it pointed out, since it seems so obvious. If it isn't: why not?

[Well, it is. SpamAssassin works in just this way, using a back propagation neural network to adjust the scores of the heuristics they use. Quite why this isn't spoken about more frequently at spam conferences is anyone's guess.]

Posted by spamlinks at 12:00 PM | Comments (0)
everything you didn't want to have to know about spam
Spam Links Home Creative Commons License
This work is licensed under a Creative Commons License.
Hosted by spam.abuse.net. Domain registration by Gregg DesElms.
Thanks to these sites, for having provided mirrors in the past:
Spamfo, OpenRBL, DNSLife, CerealKiller, MysticNights, Ih8spammers, Sysadmin.info, Westdam

SPAM is a trademark of Hormel Foods.

Page last updated: 15-Nov-2004