2010

2009

2007

2006

2005

SpamBayes and a plan for Spam

Entry published nov 23 2006

I’ve been working on setting up my virtual server which runs this site and Kiwi, and I’ve been tinkering with the mail system (of course). I have Postfix running as the MTA, which sends incoming mail to Procmail, which deposits my mail into a series of Maildir directories. From there, I can use the wonderful Dovecot (maybe the best IMAP server out there right now) to view my mail. The only thing that is missing is SPAM protection. When I was on Dreamhost I had SpamAssassin doing the SPAM filtering for me, this past week without any sort of filter and I’ve been hit with a barrage of junk e-mail.

I’ve done the basics, like configured Postfix to reject senders without a valid domain. I installed and ran SpamAssassin for about a day, but SpamAssassin is huge! I mean, really, it gobbles up RAM. Normally this wouldn’t be a problem, but I’m running on a VPS with 96MB of RAM, and I’ve got a bunch of services, I can’t afford to have SpamAssassin taking up 40 MB! By default it’s even worse, since SpamAssassin starts with 5 processes on Debian, each of which take 20MB of RAM. Yikes! I did the natural thing and scaled it back as much as possible, 2 processes, but that is still 40MB of RAM.

So what’s the solution? I’m happy to say that I’ve found something that uses fewer resources and has filtered out every piece of junk mail: SpamBayes What is SpamBayes? It’s server side Bayesian filtering software that you train based off of the mail you receive. I’m going to present here a few simple directions for setting up SpamBayes, however I’m making a few assumptions:

  • That you have an MTA (mail transfer agent) such as Postfix or Exim setup
  • You have Procmail configured to be your MDA (mail delivery agent)
  • You’re using Maildir (these steps can be easily modified for mbox or mh)
  • You’re using IMAP

So on with the show:

  1. Get SpamBayes from the website (it’s a series of Python scripts), or if you are on Debian apt-get install spambayes

  2. Create a SpamBayes database by running sb_filter.py -n in your home directory

  3. Create a simple configuration file ~/.spambayesrc which tells SpamBayes where to find your database. Here is the config file I used:

    [Storage]
    persistent_use_database = True
    persistent_storage_file = ~/.hammiedb
    
  4. Edit your .procmailrc file so that it invokes SpamBayes when e-mail is delivered. I also added some filters to filter out e-mail that SpamBayes tags as Spam, and e-mail that it is unsure about. Edit the path to sb_filter.py as appropriate, and note for my setup this puts Spam in a Spam folder and unsure mail into, you guessed it, Unsure:

    :0 fw:hamlock
    | /usr/bin/sb_filter.py
    
    
    :0
    * ^X-SpamBayes-Classification: spam
    .Spam/
    
    
    :0
    * ^X-SpamBayes-Classification: unsure
    .Unsure/
    
  5. Add a line to your crontab file so that every night SpamBayes learns by looking at e-mail you’ve put in the Spam folder, and your Inbox. Run crontab -e and add this line:

    10  0   *   *   * /usr/bin/sb_mboxtrain.py -g /home/mronge/Maildir/cur -s /home/mronge/Maildir/.Spam/cur
    

What does that line do? It says that every night, 10 minutes after midnight, run sb_mboxtrain.py (despite the name, it works on mbox, Maildir, and mh), where good mail is stored in the Inbox located at /home/mronge/Maildir/cur and Spam is stored in /home/mronge/Maildir/.Spam/cur. Of course you’ll have to adjust your paths above as necessary for your own system. This way SpamBayes gets smarter every day but scanning your e-mail. You can even add other folders so that SpamBayes can train off of your archives or mailing list folders, for me I filter out mailing list e-mail in my .procmailrc, so I don’t train off of it.

After that, you’re done. E-mail which is left in your Inbox will be used to train what a “good” message looks like, and mail that you move to your Spam folder will be used to train the system on what spam looks like. Also, make sure you check your unsure IMAP folder, and move any mail that SpamBayes is unsure of to it’s proper location. Hopefully, like me, you’ll have success in filtering out Spam without the performance penalty of running SpamAssassin. As a final note: I realize these directions assume quite a bit of unix and sysadmin knowledge, if you have any trouble with the above feel free to leave a comment and I’ll do my best to help out. Another final note: Let me know if you spot any grammatical mistakes (or if I’ve got some of the technical details screwed up).

0 comments category: unix
. o .

It's moving time.

Entry published nov 17 2006

I’ve been working on migrating my sites over to my shiny, new virtual private server. Like always, it’s taking longer than anticipated, mostly because of a lack of sysadmin experience on Debian. I’m learning quickly, and I have a few things setup, like the main Kiwi website, and a wiki run off of MoinMoin.

I’m in the process of setting up SMTP, IMAP and migrating this TheRonge over as well. There are a number of things I need to do, like setup Mailman, configure Exim and Dovecot, and move this Wordpress install. I’m going to be working on all of these things this weekend, and I want to have it done soon so things can return to equilibrium around here.

I’ve got a Kiwi development plan etched out, but I’m not quite ready to post a link to the development plan quite yet. I need to get some things ironed out before the server is ready to take any sort of serious traffic. :)

0 comments category: general
. o .

Language Oriented Programming

Entry published nov 06 2006

Check out this talk by Martin Fowler on domain specific languages. It’s directed at Java developers, but the same ideas can very easily be applied to Objective-C. Link via Ralph Johnson

0 comments category: general programming
. o .

Kiwi progress...

Entry published nov 02 2006

Progress has been slow on Kiwi lately, and I apologize for that. I’m a student as many of you know, and I seem to have a ton of projects and homework assignments floating around. Recently, there’s been a bit of progress, like builds no longer fail (yaay!). The client is still not functional in it’s current state. I’ve started the process of removing glue code and moving to a nearly 100% bindings built app. I’ve learned two things from this experience:

  1. The repository should always be left in a build-able state
  2. There shouldn’t be a loss of functionality with a subversion commit

If either situation occurs, development should be done on a branch or work should be done in a working copy until both of the conditions are satisfied. This might be obvious to most developers, but it wasn’t until recently that I learned this particular lesson.

With that out of the way…

A new server

Just today I signed up for a virtual private server through Quantact. I’ve also registered a domain name, kiwiclient.org, which I will be moving the Kiwi resources over to. I’m going to be setting up a bug tracking system, a wiki, http, mailman and etc, all with the sole purpose of supporting this project. So please be patient, some resources might go down for a day or so while I move to the new server, but it will be for the better.

What about my repository access?

I’ve granted a number of people access, but that will be changing with the move to the new server. I hate to revoke repository access, but I think it will work better if people submit patches to the list and I can approve them as deemed necessary. If people become very active in the project, I can then grant repository access as is needed.

What can I work on?

A number of people have asked me what they can work on, I’ll be putting together a to do list shortly, and I’ll be adding documentation on the architecture of Kiwi. Once that is up it will be easier for others to contribute to the project. I want to put together a series of milestones, so that we can incrementally add more functionality to Kiwi, and hopefully get off the ground sooner.

So hang in there, I’m working on turning this into a real open source project. This weekend I want to have the new server up and running and a project schedule with milestones.

0 comments category: kiwi