Categories

  • Apple (15)
  • Coding (4)
  • del.icio.us (14)
  • General (136)
  • Life (10)
    • Remodel (1)
  • Politics (16)
  • Project Steamroller (1)
  • Spam (11)
  • Sysadmin (9)
  • Tech/Geek (15)
  • Uncategorized (52)

Ye Olde Posts

  • March 2010 (1)
  • January 2010 (3)
  • December 2009 (1)
  • November 2009 (1)
  • October 2009 (2)
  • September 2009 (1)
  • August 2009 (3)
  • July 2009 (3)
  • June 2009 (2)
  • May 2009 (2)
  • April 2009 (1)
  • March 2009 (4)
  • February 2009 (2)
  • January 2009 (1)
  • December 2008 (1)
  • September 2008 (1)
  • July 2008 (1)
  • May 2008 (5)
  • April 2008 (2)
  • March 2008 (9)
  • February 2008 (5)
  • January 2008 (6)
  • December 2007 (7)
  • November 2007 (2)
  • October 2007 (6)
  • August 2007 (7)
  • July 2007 (2)
  • June 2007 (3)
  • May 2007 (3)
  • April 2007 (8)
  • March 2007 (8)
  • February 2007 (10)
  • January 2007 (3)
  • December 2006 (2)
  • November 2006 (1)
  • October 2006 (2)
  • August 2006 (2)
  • July 2006 (2)
  • June 2006 (2)
  • May 2006 (5)
  • April 2006 (2)
  • February 2006 (1)
  • January 2006 (2)
  • December 2005 (2)
  • November 2005 (2)
  • October 2005 (3)
  • September 2005 (1)
  • August 2005 (1)
  • July 2005 (3)
  • June 2005 (3)
  • May 2005 (1)
  • April 2005 (1)
  • March 2005 (1)
  • February 2005 (4)
  • January 2005 (1)
  • December 2004 (3)
  • October 2004 (3)
  • July 2004 (1)
  • April 2004 (5)
  • March 2004 (5)
  • February 2004 (5)
  • January 2004 (3)
  • December 2003 (2)
  • November 2003 (9)
  • October 2003 (5)
  • September 2003 (4)
  • August 2003 (3)
  • July 2003 (2)
  • June 2003 (8)
  • May 2003 (5)
  • April 2003 (4)
  • March 2003 (10)
  • February 2003 (25)
  • January 2003 (12)

Monthly archives for December, 2008

On the time domain, with regard to spam

Dec22
2008
1 Comment Written by Craig
One year of mail on my server

So. There has, off and on, been a debate on whether or not the day/time at which an email arrives at your system from the outside world can be used to help determine whether or not the message is spam. The argument has generally been inconclusively decided, with rules providing for such identification generally not getting folded into projects such as SpamAssassin. I have till now been tentatively on the side of those who favor putting such rules in. Now I’m fully convinced that this must be a useful test. The chart above shows the volumes of non-filtered email traffic in and out of my web server, plotted over the last 12 months. Notice the regular weekly spikes in traffic numbers; weekdays see a lot more email than weekends. This is sure to surprise nobody. Note that spam, viruses, etc have been filtered out of this stream; this is only “real” mail and false-negatives. Ignore april. I had logging problems in April.

Now.

Malmail detections in email over the previous 12 months

Notice the complete lack of weekly spikes in that data. ["Rejected" by the way means I rejected the incoming SMTP connection before it even started speaking protocol at me, ie based just on the IP address. Almost all of those will be spam sources in XBL or SORBS's SOCKS registry.]

So, we have massive swings in total email volume during the week. But no swings in malmail (spam, viruses, etc). Therefore, the ratio of malmail to real mail clearly is affected by time of day/day of week. If mail arrives outside of “normal email” hours, it surely is much more likely to be malmail; a rule which learns which days/hours are good vs bad and scores mail accordingly surely would be useful for identifying and filtering out malmail.

Secondly, I’ve noticed anecdotally a pattern on “missed” spam which makes it to my inbox. It arrives in batches. When I’m going through and selecting/deleting based on subject lines in my inbox preview pane, I am almost always shift-selecting to delete multiple adjacent messages, not ctrl- or cmd-selecting to select discontiguous messages. I’ve noticed that generally, I’ll be more or less selecting all messages for a timeslot like 10pm-7am; basically for my personal email, everything that arrives in that window is spam. Not always, but very often. Occasionally I’ll get something from a European who doesn’t sleep at the right times of day. Sometimes something really crazy will happen and a friend will email me from Korea, where time is really fucked up. They might even have like one of those on-the-half-hour timezones over there. Anyway, clearly it’s not a hard and fast rule, but the SpamAssassin umbrella/plugin system with scoring is designed *precisely* for dealing with indications as opposed to hard-and-fast rules.

So, after a multi-year hiatus (partly due to a non-compete agreement which prevented me from doing some work in the field), I think it’s time to get a little back into anti-spam hacking. No doubt there’s some thoughts and previous work on this out there on the lazyweb from which I can benefit. The “it can’t be used to identify spam though” is just not true though — there might be some rules needed to protect against exceptions, but it’s obvious looking at those graphs above that it can and very likely should be used to help identify spam.

Posted in Spam - Tagged malmail, Spam, spamassassin
SHARE THIS Twitter Facebook Delicious StumbleUpon E-mail

Translate

EnglishFrenchGermanItalianPortugueseRussianSlovenianSpanish

Search

Recent Comments

  • Craig on On the efficiency of Virtual Machines
  • flickr.com/photos/jm on On the efficiency of Virtual Machines
  • jmason on Neat. A new way to track website visitors!
  • jmason on On the time domain, with regard to spam
  • pooya on Interesting Tivo trivia bit

Meta

  • Log in
  • Entries RSS
  • Comments RSS
  • WordPress.org

EvoLve theme by Blogatize  •  Powered by WordPress Craigalog
Craig's musings