Tuesday, February 6, 2007

Stuff I Never Built, Part 235

There's a Make blog tip advocating yet another tricky way of outsmarting blog spammers.

The technique is very crafty, very smart, but the idea, I think, is totally misguided. We don't need another cat-and-mouse game. With the advent of Bayesian spam filters, there are a lot of people (myself included) who really just don't even see spam any more.

Obviously, the blog world needs a Bayesian spam filter. I want to build one in Ruby. It might happen -- but I'm posting it here just in case somebody with more time than I have, or quicker Ruby-fu, wants to beat me to the punch. It would be a really good thing -- you'd be doing a favor for all humanity.

Just to underline this a bit, the captcha techniques are defeatable. Open source hackers have done it; academic researchers have done it. Everybody who's done it is currently tight-lipped about how, but it's only a matter of time until the genie escapes the bottle. I actually heard rumors about a company buying captcha-defeat software on some black market here in the US for only a thousand dollars. Spammers bypassing captchas is only a matter of time.

By contrast, Bayesian filters are a much more sound technique, much harder to bypass, and deliberate efforts to bypass them actually make them stronger.

It really is a very good idea.

