UCLA Mathnet Login

Advanced Spam Filtering

Tags: 
HOWTO
E-mail

If you really hate spam emails, you can manually "train" your anti-spam filter using a Bayesian classifier. A manual collection of good and bad emails will be analyzed and the patterns identified will be added to the spam filtering process.

To start, you should save your False Negatives (spam that slipped through) into their own mail folder. In this tutorial, I have already created a new folder in my mailbox called "spamfolder" and have manually moved junk emails into that directory.

Next, you would need a collection of False Positives as well. These are called "Ham" emails and are accidentally marked as spam and moved into the junk folder by mistake. Again, you would create a separate folder in your mailbox to store these emails, such as "hamfolder." If you do not have any "Ham" emails, a collection of legitimate emails would suffice.

Now that you have organized a collection of spam and ham emails into their respective folders, we can log into your home servers and train the spam filter. For this example, we will be logging into julia.math.ucla.edu. You can use Putty to start a SSH session if you are on a Windows PC.

Once logged into your respective home servers, simply run the commands below to point to the spam and ham folders you created. Just remember to modify the folder name or path if you used something different:

     sa-learn --spam --mbox ~/Mail/spamfolder

     sa-learn --ham --mbox ~/Mail/hamfolder

Here is a screenshot of the process. 11 tokens were identified from the 21 messages in the "spamfolder"​​​​​ and 3 tokens were identified from the "hamfolder."