absolutly amazing
June 27, 2003 8:01 AM   Subscribe

Baysian spam filter for outlook. Installation was a snap, and it works so well, it's surreal. I'd heard a lot of good things about Baysian spam filters. but this was beyond belief. The damn thing actualy detected legitimate mails that I had accidentaly thrown away!
more gushing inside
posted by delmoi (43 comments total)
Wow. I mean. Wow. This this is amazing.

The first thing you need to do is 'train' the system, for me this meant going through my Inbox, and my deleted messages folders and placing all the spam in a folder I called "filtered spam".

This took a while. There were about 1500 messages in my inbox (thankfully I'd only gotten sloppy about deleting spam in the past few months). My deleted messages were about half spam, half useless messages from 'legit' message lists which I wouldn't want filtered. I wanted the thing to be trained well, so I tried to be careful.

Anyway, after I ran the filter I looked at the spam scores of the files in my inbox. The vast majority were 0% or 1% The only real outlier was a mail message with one word document attached. And of course there were several spam messages that I'd missed.

Looking in the 'spam' folder was even more of a shock. Even though I'd created the folder that night and put every peice of spam in their manualy, I still screwed up and put a few legit letters in there. The filter had marked them apropriately.

Out of thousands of messages only about 10 - 20 had scores between 2% and 85%. They were mostly spam advertizments, but from ebay, amazon and, oddly, advertizements for apartments in Ames, IA which had been sent to my university email address.

Anyway. I reccomend this filter to anyone and everyone. I've not set it up to automaticaly delete things, just give a score which will really help in my manual filtering. I'm worried about false positives, but this friggin' thing has a lower false positive rate then I do!
posted by delmoi at 8:14 AM on June 27, 2003

I've been beta testing the Office 2003 Beta 2 release, and the Outlook 2003 has a very good junk mail detection feature. I have it setup as follows:

Allow all mail sent by people in my address book. Allow senders on my allow list. It automatically sends all mails that it thinks is spam to a folder called Junk Mail. I don't have it delete the emails of course. I do a cursory check of all emails in this folder once or twice a day, and mark the ones that aren't junk as "Not Junk" and that's about it. It works really well. I've not seen a single junk mail in my Inbox since I've set this up.
posted by riffola at 8:40 AM on June 27, 2003

I've been using SpamSieve with Microsoft Entourage on the Mac (it also works with other Mac e-mail programs) and have been very satisfied with it. One of the cool things it does is, if your ISP runs SpamAssasin, it collapses the SpamAssassin header into a single token, so that over time it actually learns how reliable SpamAssassin is. I have set up my mail server to do a similar thing for blacklists (it adds a header with a made-up string that should never appear in actual mail) so again, SpamSieve learns how strong an indicator of spam a blacklisted sender is.
posted by kindall at 8:40 AM on June 27, 2003

Oh, and I should mention -- I recently remembered I had a mac.com e-mail address I hadn't checked for a while. When I retrieved mail from it, it had 71 messages in it. SpamSieve accurately fished out the one legitimate message and flagged the other 70 as spam.
posted by kindall at 8:42 AM on June 27, 2003

For you Mac users, OS X's builtin mail application also uses the "bayesian" techniques for spam filtering.
posted by mosch at 8:46 AM on June 27, 2003

If you're worried about spam filtering, and you're using Outlook 2k, XP, or even 2003, you may in fact be in an Intranet, and receiving your email via MS Exchange.

If that's your situation, I would suggest that you lightly tap your server admin on the shoulder and tell him about GFI MailEssentials. (I do not work for GFI)

It supports blacklists, white lists, content checking, and header checking. It'll dump all spam into a folder for you, or forward it to a special address. The best part of the whole package is that it's functional beyond the 60 day trial period.

As the admin for my little work network, it's been an immense help. Not only does it catch everything I need it to, it supports dynamic whitelists (populates the list with anyone that a user on the network sends an email to).

[/sales pitch]
posted by thanotopsis at 9:00 AM on June 27, 2003

Hmm.....Spam building up again in the Darwinian struggle of spammers vs. filters. Baysian only for Mac OSX? I may have to make that dreaded upgrade.
posted by troutfishing at 9:03 AM on June 27, 2003

Surely, it's incumbent on MS to develop a version of Outlook Express that works with this sort of filtering system? I mean, Bill recently said he wanted to see an end to spam - why not just start off with a a version of OE that will allow spambayes ... whatever that actually is ... And then get OE to allow message rules/blocking that will operate against html spam.

I know, it's a free prog, but even so...
posted by dash_slot- at 9:04 AM on June 27, 2003

Once again, Apple leads the pack. Bayesian filtering was introduced into Mail.app with OS 10.2 (Jaguar) last year.
posted by Cerebus at 9:09 AM on June 27, 2003

popFile is a baysian spam filter that works for those of us who don't have OSX or Thunderbird or Outlook. (e.g. outlook express) Training takes place in a browser window (which isn't ideal), but it works well enough. In the meantime, can somebody tell me who stole metaFilter, and replaced it with slashDot.
posted by seanyboy at 9:24 AM on June 27, 2003

I've been using CloudMark's SpamNet with outlook XP for some time now. It has a 30 trial then its $4/month. It is basically a P2P spam filtering system. when i see a spam and hit block, it sends the unique characteristics of that mail to its servers and anyone else who gets that message has it blocked. If I unblock an email, then the person who blocked has a lowered reliablity score. All it all, its basically p2p spam filtering. Works very good. You can either have it move them to a spam folder or delete them. works for me.
posted by Zebulun at 9:26 AM on June 27, 2003

I've been using PopFile since about Feb. Within a few days it was well over 80% accurate, and now usually hangs out in the 97% accuracy range -- and its mistakes tend to be false negatives (ie. letting spam through) rather than false positives (ie. losing good email).
posted by five fresh fish at 9:30 AM on June 27, 2003

Slight aside: Am I the only one who finds Outlook's interface to be unusable? Because of this, even though I have the program, I still use Outlook Express.

I've recently started using Spam Assassin and it seems to help, but it would certainly be improved by being able to train from directly within the email program.
posted by rushmc at 9:30 AM on June 27, 2003

For those of you using Eudora, Spamnix is a very nice plug-in utilizing both heuristic modeling (i.e. Spam Assassin) and Baysian algorithms (this feature is in beta).

I've been using it for several months and have been very pleased with its performance and detection results. Although, with Eudora 6.0 around the corner (which will inherently handle spam), you may want to hold out and do a comparison at a later point.
posted by aaronchristy at 10:01 AM on June 27, 2003

rushmc: Am I the only one who finds Outlook's interface to be unusable?

Nope! I use Nelson Email Organizer -- it basically sits on top of Outlook and provides a much better interface. I'm still searching for the right anti-spam software to run with it, though, as I don't really like the recommended one.
posted by jess at 10:07 AM on June 27, 2003

As cool as these filtering techniques are, they do nothing to reduce the amount of spam in the world. I prefer to go on the attack.

I check my mail at the server with MailWasher. It checks blacklists and marks suspected spam so I can delete and bounce it before retrieving my mail. More importantly, I copy the spam messages for reporting at Spamcop.
posted by pmurray63 at 10:24 AM on June 27, 2003

Being in charge of the mail server for my lab, I'm using SpamAssassin. It was pretty good before the Bayes processing, and now it's pretty damn good. I still get the occasional false-negative that slips through, but I almost never false-positives. (I'd say at most I've had a handfull in the past month).

It's not for the faint of heart though, training it with mistakes can be a real bitch (I just discovered that the way I was doing doesn't work, so I'm having to adjust the system.

Eudora's latest betas have added some junkmail filtering, and I get the impression it's supposed to be trainable, but man, it's terrible at the moment. And it sure isn't training yet based on my experiences.
posted by piper28 at 10:42 AM on June 27, 2003

riffola: it sounds like your system would have a lot of false positives, which IMO are much worse then false negatives.

The funny thing is, outlook does have a built in spam 'filter' but it dosn't work at all. For me there are more false positives then true positives because all anouncements from work have a subject that starts with [massmail].
posted by delmoi at 10:58 AM on June 27, 2003

I hate to say it, but I've never been that impressed with the junk filter on OS X's Mail. In particular, it never seems to get any smarter with more training. Fortunately, nearly everything Mail misses gets caught by Spamholio.
posted by Armitage Shanks at 11:13 AM on June 27, 2003

I use Mozilla Mail, since it's had Bayesian Spam filtering for a while. Works a treat.
posted by salmacis at 11:15 AM on June 27, 2003

By the way, it's "Bayesian," not "Baysian," which I take time to point out specifically because Bayes was such a fascinating guy -- an 18th century minister who lived almost all his life in a very small town in the UK. The equation he conceived which forms the basis for all of this new technology was scribbled in his notebooks and forgotten until after his death -- he believed the world had no practical use for it.

I know that self-linking is lame on MeFi, but I did happen to write an article a while ago that gives a lot of background on Bayes. To skip right to the relevant material, search for the word "Infidel." [grin]
posted by digaman at 11:32 AM on June 27, 2003

Wait - you had 1500 messages in your inbox?

digaman: Is he the guy that gave us Bayesian mimicry?
posted by gottabefunky at 11:38 AM on June 27, 2003

Yep, he sure was. That company I wrote about in that article, Autonomy, by the way, went on to great infamy in re: its connections to Richard Perle, one of the grand spooks running US foreign policy these days.
posted by digaman at 11:44 AM on June 27, 2003

As pmurray63 mentioned, Mailwasher is an excellent spamcatcher, and what's more it bats them back to the sender in a very satisfying manner. I was getting maybe 20 spam mails a day and within a few weeks Mailwasher had that down to 1 or 2. It does slow your mail downloading though, well on my machine at least.
posted by squealy at 11:47 AM on June 27, 2003

I never, ever give my real email out to any corporation or website, only to actual people. I don't get spam (except at one old address which I unwittingly gave to an online flower retailer and at the address I use for newsgroups posting).
posted by signal at 11:58 AM on June 27, 2003

and what's more it bats them back to the sender in a very satisfying manner.

Actually, all that does is annoy the hell out of mail administrators that often have nothing to do with the spam mail that was being sent. The return stuff on most spam is forged. You may feel satisfied it's bouncing them back, but that bounce back is completely useless.
posted by piper28 at 12:06 PM on June 27, 2003

Obligatory Slashdolt-like comment:

Apple doesn't lead the pack. bogofilter got there first. ;)

Been using it on my personal mail for about a year now, and I love it.
posted by jammer at 12:25 PM on June 27, 2003

Someone please tell me there's a user-side filter for Lotus Notes. I've looked, but I haven't found squat.

Thanks for mentioning SpamSieve, kindall. I've been looking for an Entourage filter. I found one the other day that was rather pricey and apparently not Bayesian.
posted by mblandi at 1:38 PM on June 27, 2003

Anyone know of a plugin spam-filter for Outlook Express on Mac OS 9? The built in "junk mail filter" is useless. A simple "delete any message whose send-date was more than a week ago" rule kills more spam than the supposed spam filter.

posted by Mars Saxman at 2:07 PM on June 27, 2003

Apple doesn't lead the pack. bogofilter got there first.

Not true, in spite of what Eric Raymond might say. First Bayesian spam filter was ifile, for a mail system that no-one really uses any more...
posted by riviera at 2:35 PM on June 27, 2003

You may feel satisfied it's bouncing them back, but that bounce back is completely useless.

You may be right, I'm not a sysadmin. But if that's the case, why do so many people report a significant reduction in the amount of spam they receive after using Mailwasher for a while? My ignorant assumption is that the address eventually gets removed from lists when delivery fails. Isn't there a market amongst spamscum for email address lists that contain known working addresses?
posted by normy at 2:35 PM on June 27, 2003

SpamSieve is now integrated with Mailsmith, the e-mail client built by the same folks who brought you BBEdit. Intelligent spam handling, powerful mail filtering, AppleScript and text-only e-mail - gotta love it!*

*(Unless you need IMAP, in which case, you don't gotta love it)
posted by likorish at 4:21 PM on June 27, 2003

PMail (Pegasus) is very good these days, too. Calypso is easily as good as Eudora, feels just like it, and is free (was commercial).
posted by five fresh fish at 5:18 PM on June 27, 2003

normy - I can't claim to be representative, but I've used mailwasher religiously for over 6 months now, bouncing all spam mails, and it hasn't made the slightest bit of difference. Feels good though...
posted by chrispy at 7:11 PM on June 27, 2003

I've been using a program called K9, which functions as a POP proxy rather than an add-on to any particular mail client. So while it's a little tougher to set up, it should work with any POP mail client that will run on Windows. Ditto all the gushing above, Bayesian filtering rocks.
posted by RylandDotNet at 8:10 PM on June 27, 2003

Betas of Eudora 6 have Bayesian filtering, which does seem to be trainable and working well after a month or so... the default settings did not work well for me, though. Supposedly future betas and the release version will allow plug-ins (SpamSieve will be my choice, in that case). At any rate, though the Eudora filters aren't perfect yet, they are pretty damn good. They caught more than 5000 spams for me in the last 30 days. (Since I have a business I can't keep my address a secret -- hence the massive quantities of spam.)
posted by litlnemo at 8:36 PM on June 27, 2003

SpamSieve already works with Eudora. It's not a plug-in but a standalone app (AppleScript is used to pass the message to SpamSieve for filtering).
posted by kindall at 9:02 PM on June 27, 2003

Betas of Eudora 6 have Bayesian filtering, which does seem to be trainable and working well after a month or so

You've had luck training the eudora 6 stuff? I've been trying to get it to learn for a while now and I'm firmly convinced it's useless. Training doesn't seem to have any effect. I'm seriously tempted to disable it because spamassassin does so much better at identifying spam. Eudora gives me more false positives than it does real positives (and I get a fair amount of spam, so that's saying something).

But if that's the case, why do so many people report a significant reduction in the amount of spam they receive after using Mailwasher for a while? My ignorant assumption is that the address eventually gets removed from lists when delivery fails.

From the mailwasher web site:

MailWasher uses an algorithm to determine the best route to send the bounced message back (from, reply to, return path) and actually sends the bounce back via your ISP's postmaster, so it looks exactly like it has come from your ISP and not from you at your address. If the spammer has used a fake address, then your bounce message will itself be bounced back to the postmaster and you won't receive the bounced bounce email.

Ok, there's a couple of things here. First off, it's unlikely the spammer is getting the bounce message in the first place, since the address is likely forged. That makes it seem unlikely to me that anyone's really receiving a "bounced" message. Next, from the next item in the faq that I don't quote above, it says the bounce message is similar to a real bounce message. My guess is it's fairly identifiable. Besides, even if it's not, bounce processing can be a pain even for software that's written to do it, and I guarantee you spammers don't care enough to have tools that do it. (Heck, nowadays they use dictionary attacks on usernames at sites hoping to hit them). Finally, I'd be carefull using that bounce feature. Based on it's description, it's forging email. Most user agreements I've seen take a very harsh line on forging email, and it's grounds for terminating an account. I know if people on my server were doing that, I'd be having a talk with them. Postmaster gets enough crap because of spam, the last thing they need is you contributing to the cause. (In fact, I'd argue that this useage is very borderline on spam itself, you're generating unsolicited email for the postmaster).

(I also find it kinda amusing that an anti-spam package has a provision to let you spam your "friends" with a message about the software, via the tell a friend option).
posted by piper28 at 10:59 PM on June 27, 2003

kindall, spamsieve doesn't fully work with eudora yet. It can only run after all of Eudora's other filters have run; in my case, that means that it won't filter mail to my business address, which makes it sort of useless. But it does work great with the mail it can filter. Supposedly the issue will be fixed in Eudora 6, but isn't yet in the beta I have.

piper28, it didn't train at first, but that was due to user error in my case (and what I think is a flawed interface) -- I thought that just moving things in or out of the junk folder would train it. It doesn't; you have to use the Junk/Not Junk options in the Message menu. Once I started doing that, I began to see a difference. Particular mails that were getting false-positives before (order confirmation mails from my shopping cart) stopped getting them. So it's working for me, at the moment.
posted by litlnemo at 12:07 AM on June 28, 2003

i use a combination of spamassasin (without bayesian), block lists (ordb, spamhaus, blitzed, monkeys) and procmail (mainly sorting, but some killing). in about 6 months of use i'm aware of one mail that was deleted incorrectly and one spam gets through about every other day.

then everything is managed by an imap server which i read using squirrelmail (i block imap at the firewall but let https through for remote mail reading) or, for technical mailing lists, presented as news groups via gnus (in emacs).

geeky email bliss (on linux, of course)
posted by andrew cooke at 5:59 AM on June 28, 2003

spamsieve doesn't fully work with eudora yet. It can only run after all of Eudora's other filters have run; in my case

It's not too difficult to modify the AppleScripts that come with SpamSieve to have some additional logic in them, so that you do some of your filtering with the script rather than with Eudora's rules. Entourage has a "feature" which requires that if a rule executes a script, that rule is the last one to execute, so I've had to do a bit of that myself.

Microsoft's excuse for not allowing additional rules after a script is that your script might delete the message or move it to a different folder, and Entourage won't know what message to filter anymore. That's lame; nobody would be upset or surprised if E'rage threw an error trying to execute a rule on a message that had just been deleted by a script. What's worse is that the move-to-different-folder action only "loses" the message on an IMAP account, but they won't enable it for POP accounts "for consistency."
posted by kindall at 9:17 AM on June 28, 2003

I thought that just moving things in or out of the junk folder would train it. It doesn't; you have to use the Junk/Not Junk options in the Message menu.

I've been doing that all along, and still honestly don't feel it's making any difference. Maybe it's in the nature of the mail I'm getting, but the stuff eudora tends to flag is stuff Spamassassin, even without bayes filtering, wouldn't even come close to thinking is spam.
posted by piper28 at 11:55 AM on June 28, 2003

I'll throw in a vote for SpamNet as well. Just got off a week's vacation to find 1500 messages in my inbox this morning. SpamNet cranked through them all in about an hour. So far, I've only found one false positive. Very efficient and a time saver.

Our company was running GFI MailEssentials but the latest version had a lot of problems (could have been our configuration.) Legitimate stuff was blocked/delayed while nasty "donkey porn" was still making it through. Right now we're running with out a filter until our IS staff comes up with a new solution, making SpamNet an even greater resource!
posted by MediaMan at 10:38 AM on June 30, 2003

« Older patent nonsense   |   Car Talk Staff Newer »

This thread has been archived and is closed to new comments