A visual history of spam (and virus) email


I have kept every single piece of spam and virus email since mid-1997. Occasionally, it comes in handy, for example, to add naïve Bayesian spam filter to my custom-written email filter. And occasionally I use it to build a chart of spam and virus email.

The following chart plots every single piece of spam and virus email that arrived at my work email address since April 1997. Blue dots are spam and red dots are email viruses. The horizontal axis is time, and the vertical axis is size of mail (on a logarithmic scale). Darker dots represent more messages. (Messages larger than 1MB have been treated as if they were 1MB.)

Note that this chart is not scientific. Only mail which makes it past the corporate spam and virus filters show up on the chart.

Why does so much spam and virus mail get through the filters? Because corporate mail filters cannot take the risk of accidentally classifying valid business email as spam. Consequently, the filters have to make sure to remove something only if they has extremely high confidence that the message is unwanted.

Okay, enough dawdling. Let's see the chart.

Overall statistics and extrema:

  • First message in chart: April 22, 1997.
  • Last message in chart: September 10, 2004.
  • Smallest message: 372 bytes, received March 11, 1998.
    From: 15841.
    To: 15841.
    Subject: About your account...
    Content-Type: text/plain; charset=ISO-8859-1
    Content-Transfer-Encoding: 7bit
    
    P
    
  • Largest message: 1,406,967 bytes, received January 8, 2004. HTML mail with a lot of text including 41 large images. A slightly smaller version was received the previous day. (I guess they figured that their first version wasn't big enough, so they sent out an updated version the next day.)

  • Single worst spam day by volume: January 8, 2004. That one monster message sealed the deal.

  • Single worst spam day by number of messages: August 22, 2002. 67 pieces of spam. The vertical blue line.

  • Single worst virus day: August 24, 2003. This is the winner both by volume (1.7MB) and by number (49). The red splotch.

  • Totals: 227.6MB of spam in roughly 19,000 messages. 61.8MB of viruses in roughly 3500 messages.

Things you can see on the chart:

  • Spam went ballistic starting in 2002. You could see it growing in 2001, but 2002 was when it really took off.

  • Vertical blue lines are "bad spam days". Vertical red lines are "bad virus days".

  • Horizontal red lines let you watch the lifetime of a particular email virus. (This works only for viruses with a fixed-size payload. Viruses with variable-size payload are smeared vertically.)

  • The big red splotch in August 2003 around the 100K mark is the Sobig virus.

  • The horizontal line in 2004 that wanders around the 2K mark is the Netsky virus.

  • For most of this time, the company policy on spam filtering was not to filter it out at all, because all the filters they tried had too high a false-positive rate. (I.e., they were rejecting too many valid messages as spam.) You can see that in late 2003, the blue dot density diminished considerably. That's when mail administrators found a filter whose false-positive rate was low enough to be acceptable.

As a comparison, here's the same chart based on email received at one of my inactive personal email addresses.

This particular email address has been inactive since 1995; all the mail it gets is therefore from harvesting done prior to 1995. (That's why you don't see any red dots: None of my friends have this address in their address book since it is inactive.) The graph doesn't go back as far because I didn't start saving spam from this address until late 2000.

Overall statistics and extrema:

  • First message in chart: September 2, 2000.
  • Last message in chart: September 10, 2004.
  • Smallest message: 256 bytes, received July 24, 2004.
    Received: from dhcp065-025-005-032.neo.rr.com ([65.25.5.32]) by ...
             Sat, 24 Jul 2004 12:30:35 -0700
    X-Message-Info: 10
    
  • Largest message: 3,661,900 bytes, received April 11, 2003. Mail with four large bitmap attachments, each of which is a Windows screenshot of Word with a document open, each bitmap showing a different page of the document. Perhaps one of the most inefficient ways of distributing a four-page document.

  • Single worst spam day by volume: April 11, 2003. Again, the monster message drowns out the competition.

  • Single worst spam day by number of messages: October 3, 2003. 74 pieces of spam.

  • Totals: 237MB of spam in roughly 35,000 messages.

I cannot explain the mysterious "quiet period" at the beginning of 2004. Perhaps my ISP instituted a filter for a while? Perhaps I didn't log on often enough to pick up my spam and it expired on the server? I don't know.

One theory is that the lull was due to uncertainty created by the CAN-SPAM Act, which took effect on January 1, 2004. I don't buy this theory since there was no significant corresponding lull at my other email account, and follow-up reports indicate that CAN-SPAM was widely disregarded. Even in its heyday, compliance was only 3%.

Curiously, the trend in spam size for this particular account is that it has been going down since 2002. In the previous chart, you could see a clear upward trend since 1997. My theory is that since this second dataset is more focused on current trends, it missed out on the growth trend in the late 1990's and instead is seeing the shift in spam from text to <IMG> tags.

Comments (156)
  1. Hey… I’ve done quite the same thing when I graduated in 1999…

    I ported a cluster analysis tool from Fortran to Visual C++ and I tried different sets of data to test it.

    One set was related to my emails.

    I don’t remember what dimensions I checked or what clusters I found, but your post remembered me all that thing! Thanks!

  2. James Curran says:

    > Totals: 237MB of spam in roughly 35,000 messages.

    Sheesh…. I did the same thing for ONE MONTH. My totals: 1GB in roughly 116,000 messages.

    (But after training using that, SpamBayes works fine!)

  3. Robert Hahn says:

    Actually, the quiet period is present in the work-email graph – it’s harder to see because your 2004 column is as wide as your 2003 column, stretching out the data points horizontally a bit more. But it is there.

  4. chris says:

    what did you make that chart with?

  5. SmartAs says:

    Great. Now I know the great majority of spam is between 1.5K and 128K. I’ll just eliminate all that to get rid of 90% of the spam. 8v)

  6. BillT says:

    I noticed a huge dropoff at one point, also. I associated it with new Comcast procedures to limit outgoing spam, mostly from zombies, I assume.

  7. Edward says:

    Does anyone have a similar analysis of the subjects of Spam? A few years ago it was all Get-Rich-Quick schemes now its become Medication, Loans, and conterfeit software.

    I’d like to see a page similar to the Google Zeitgeist with rising and falling SPAM topics. I thought I’d try and write one myself but I figured someone would already have done it.

  8. William says:

    Human brains are the best filters.

    http://www.cloudmark.com/products/spamnet/

    And the only ones likely to work.

  9. Raymond Chen says:

    The trend towards <IMG>-based spam impairs the ability to categorize them by subject matter.

    The Comcast shutoff was in March 2004 which doesn’t seem to match up with any gaps in the charts.

  10. Eric Hodel says:

    Scanned images of a Word document printout placed in a word document.

    Yes, one was sent to my entire company by upper management.

  11. Alex says:

    I suspect that the reason that you see spam going away is to increased reliability and use of corporate-level spam filters.

  12. Todd Spatafore says:

    It would be interesting to see SPAM flux by hour of day, day of week, and by month of year. Also, it looks like there is a relative dry period at the beginning of 2003 that has almost the same width as the one at the beginning of 2004.

    This was very interesting. Thank you for sharing.

  13. Jim says:

    > Perhaps my ISP instituted a filter

    >> for a while?

    Perhaps it’s still there. It’s just that spam "evolved" since then, making the filter less effective.

  14. Are you viewing the spam? Or allowing the images to load?

    I’m sure you are smart enough to avoid viewing it or at least having it load the images so that the e-mail address can be verified. I just want to know because I hope it affects these numbers.

  15. Raymond Chen says:

    I don’t view the images or follow any links.

    The program that generates the graph is written in perl. There’s nothing like generating a bitmap by writing raw binary data.

  16. A visual history of spam (and virus) email — he is the true king of the geeks….

  17. Miles Archer says:

    Fascinating stuff. I wonder what it would look like if you graphed # vs time instead of size vs time. Again a semi log scale is probably appropriate.

  18. Anonymous says:

    Exchange-faq.dk – Din portal til Microsoft Exchange Server information

  19. Merle says:

    It would be interesting to see spam and virii on separate graphs: it’s tricky to tell a blue dot from a {blue dot next to a red dot}.

    One of my "filters" was to discard any email >50K. Friends know better than to send me even HTML email, much less attachments. But according to this, I’m not really losing much.

    (the main things I saw a drop in were the gigantic virii)

    Having your own top level domain would increase your spam tremendously. I get email to accounts that never existed, that nobody has ever used, because about a year ago spammers started emailing [X]@(domain), where [X] is picked out of a dictionary. (for the most part they do them sequentially, so it’s easy to catch, but…).

    I get 10,000 spams a day on average, and saw a day of 40,000 once. I just sat there clicking on my junk box, watching new messages coming in between every click…

  20. denny says:

    interesting ….

    I agree the white band may be related to the can-spam deal. it does fit.

    now if I had *MY* spam for the same time you might need to go by a few more disks…. for a while I was running about 4 email servers and web servers and was listed as the contact for them all…. so my volume for a while was *HUGE* I had to tell folks to take the email for the domains to someone else… it was killin me and i was not obliged to take all that work for them …

  21. Chiascuro says:

    I own a top level domain for my country and get approximately one spam every few days.

    I think it’s because my domain name is foo.{something}.{countrycode}

    I suspect the spammers search for foo in their databases and remove it because so many people use it as a fake address.

  22. Norman Diamond says:

    Regarding the base note: Your company didn’t start filtering spams until late 2003, yet your highest number of spams in one day prior to that was only 67? You must have been doing something in order to get so few, what were you doing?

    By the way, around 1997 or 1998 on a mail server I saw logs of spams being sent to addresses that had stopped existing around 1994.

    Also: 9/16/2004 4:10 PM Chiascuro

    > I suspect the spammers search for foo in

    > their databases and remove it because so

    > many people use it as a fake address.

    No way. Spammers do not care how many bounces they cause, because bounces do not get delivered to spammers, bounces get delivered to other victims.

  23. David Voss says:

    *bows* I’m not worthy, I’m not worthy…

    So how did you come to decide that you would actually save every piece of spam and virus email? For most people they want to dispose of it as quickly as possible rather than archive it, especially as long as you have.

    Thanks for sharing this. It was definitely an interesting read!

  24. Tom Seddon says:

    English plural of "virus" is "viruses".

  25. jafee says:

    I also noticed a similar lull in your work email graph at the end of 2003. I wonder ….

  26. Jeff says:

    As far as the lull’s at the start of 2003 and 2004….maybe even spammers are busy during the holidays.

  27. M. Mortazavi says:

    This is very interesting work and the comments that follow are also most interesting. If enough material is collected, this is really publishable work.

  28. Art Vandelay says:

    Spam filter on mailer server and AV program on computer.

    In the last 5 years (since Nov 1999) I have recieved 1 e-mail that contained a virus (which was promptly deleted by my AV program) and spam receipt has typically been 1 to 2 per week, sometimes going 2-3 weeks at a time with none.

    I must be doing something wrong.

  29. MagicMike says:

    I also receive admin mails for maybe 30+ domains and multiple aliases. Currently we reject those messages that are sent to non-existent addresses. This means that I have received about 10k emails since May 28 only to addresses like webmaster@domain etc. Out of these were less than 2k technical, machine sent legitimate emails and about 50 human sent, legitimate mails.

    I still get about 5-10 spams a day to my personal mailbox (only my username or my personal alias), total of 1.5k mails since 1st of Jan. I think this is really acceptable.

    Related to our whole corporate mail handling, we have a SpamAssassin based filter in use. In August we received about 1.5k mails classified as spam (5.0 points or more) and 6k mails classified as not spam ((4.9 points or less). I guess we could be even more strict.

    Just my 2c :)

  30. Here’s another graphical take on spam impact – not as far back as this one goes, but looks at my spam volume in terms of message count.

  31. KC Lemson says:

    I have a folder where I kept all of the viruses I received from anyone inside the company, but only so that I knew who was stupid, not for analysis. And I haven’t gotten one in years.

  32. Kevin Yeow says:

    Interesting. Definitely informative. I wonder if anyone was able to track spoofing too. Virus, spam, spoofing, phishing – how I wish the internet could be a safer place for real communication and sharing of info.

    Nowadays, one has to have a anti-virus protector, spam filters, adware blaster, firewall . . . etc.

    Maybe one day all these energies can be generated to productive use instead.

  33. James Boyd says:

    The one Virus proof way to check and read all your email is a live Linux CD.

    If you use a live linux CD, and go to the mail server, it is impossable to get a virus.

  34. Raymond Chen says:

    (You can still get a virus if you boot from CD. Of course, if you reboot then it’ll be removed from your system. That is, unless it installed itself into your .cshrc…)

  35. DJ Wallsauce says:

    Dude you really had nothin’ to do for 7 years did you?

  36. dj wallsause is funny says:

    hahahahah

  37. Alex says:

    My single worst spam day was when I received 960 spam messages. They were all the same content with 4 different subject lines. How does that happen?

  38. Joe says:

    Actually there are two vertical blank zones in the graph. They are one year apart, though the second one is much emptier than the first. Anything special about January?

  39. (guesswork here)

    I think it is related to the 2nd address being out of circulation since 1995.

    Probably that address is only on address CDs used by relative spammer newbies, that indeed were a bit scared for CAN-SPAM.

    The big guns probably have better ways (and filtering) to get live addresses, and your other address is probably included in those?

  40. (guesswork here)

    I think it is related to the 2nd address being out of circulation since 1995.

    Probably that address is only on address CDs used by relative spammer newbies, that indeed were a bit scared for CAN-SPAM.

    The big guns probably have better ways (and filtering) to get live addresses, and your other address is probably included in those?

  41. Congrats, you just hit /. :)

  42. Dave says:

    Yeah, watch out for your server.

  43. William Watson says:

    I can understand why corporate e-mail systems must allow false positives for SPAM. What I DON’T understand is why you continued to see increasing numbers of viruses! Surely any relatively competent anti-virus software could filter those with a high degree of success, whilst leaving the SPAM filters – if any – untouched.

  44. Raymond Chen says:

    The virus filter removes the payload but leaves the body. So you get messages that say "Please review this document" with no document. I counted those as viruses. I also counted bounces created by viruses as viruses.

  45. Stephen says:

    Prepare to be /.ed

    What made you decide to keep all your spam?

  46. Raymond Chen says:

    I thought I answered that question in the opening sentence.

  47. Dominic Pody says:

    Haha, nice. I love the graph.

    Oh, and you got Slahdotted :p

  48. Re: Virii

    Makes sense; thanks for the clarification.

  49. Dylan Smith says:

    The best way of avoiding virus mail is to block Windows executables of any type at the mail server. I started doing this when Swen broke out (I was getting 2-3 Swen worm emails PER MINUTE).

    I have never, ever recieved a legitimate Windows executable in email.

  50. Cornflake says:

    It’s going down!

  51. Anonymous says:

    Morten Isaksens blog &raquo; A visual history of spam (and virus) email

  52. It looks like your spam load took a hit in 1998. I haven’t ever seen my own spam load on any of my accounts go down unless I’d actually done something to make it happen. Are you sure there wasn’t some kind of change that year? You say there was no filtering… did you perhaps stop using your email address as widely during 1998?

    My own experience is that spam has been exponentially increasing for at least a decade. There hasn’t been any point where I would say it really took off… the graphs I did always showed a big take-off in the past couple of years but that’s typical of exponential curves. Back in 2000 (when I quit trying to report it and started filtering myself) it was about 300 MB a month.

  53. Paul Wouters says:

    I don’t agree with the comment "spam took off in 2002". Spam has, since I started meassuring it in 1997, been roughly exponential.

    See <A HREF=http://www.xtdnet.nl/paul/spam/>my graphs</A> though I should update them for the last year.

    Paul

  54. Fred says:

    I started to do this in 2001 but got bored in about 3 monthes. http://96trees.com/spam/spam.html

  55. A visual history of spam (and virus) email…

  56. I’m a sucker for cool visualizations, and here’s another. Raymond Chen has saved every bit of spam / virusmail he’s received for the past 7 years and has plotted it all out. I won’t steal his thunder by showing the charts here, but it’s quite amazing. Seems that spam peaked in 2002. Some great stats: Largest message: 1,406,967 bytes, received January 8, 2004. HTML mail with a lot of text including 41 large images. A slightly smaller version was received…

  57. osssuporter says:

    Yeah I had like a gig of spam and viruses/month, for viruses my mail server took care of it (it sends a report every month on how many things it blocked), but the spam kept coming so I looked into something other than Outlook, thunderbird and the Bat! became my choices, chose thunderbird , took it about 2 months to learn to guard against all of my spam after that periode nothing but ham. (it lets couple of spam go through, only saw false positives in the beginning).

    Couple of questions for the author, I am assuming this is not your primary email addy, when did you open up this account? and how much spam do you get in your primary accont? (assuming you protect it not give out in websites, not give it to stupid people etc.)

  58. http://yossman.net/~dave/spam.png

    Here’s a graph of my 5-6 email addresses, starting October last year.

    The size has been divided by 10 to fit with the messages, ie, size of 600 is 6000kb.

    Interesting to see when huge virus spam started becoming common.

  59. Raymond Chen says:

    I’m not sure what you mean by "this" account since there are two accounts in this article.

  60. abb3w says:

    There appears to be a fainter "lull" at the start of 2003, as well as 2004, as others have noted. I would speculate (on no evidence whatsoever) that this is caused by the number of virus-laden zombies being taken off-line and replaced at Christmastime with new uninfected machines– usually with 30/60/90 day free trials of an anti-virus solution installed.

  61. The dropoff in January (it’s present on all graphs) could be attributed to simple business logic – they’re working their butts off in the months before (retail season), and take a vacation in January, since no one is really going to buy much in that month anyways.

  62. Miryth says:

    Nice one dave, you graphed it in the two colors I can’t tell apart due to colorblindness ;p

  63. Sven says:

    Great article, thanks.

    > Now I know the great majority of spam is between 1.5K and 128K. I’ll just eliminate all that to get rid of 90% of the spam

    Sounds great but that won’t work. At least if everyone is doing that. If spamers find out spam mails are getting > 128K.

  64. Arash Partow says:

    Hello,

    I would like to see a group of the all the different types of spam e-mails out there,

    something like percentages, how many about viagra, how many about penis enlargements,

    how many about mortgages etc.

    Also it would be nice to graph these according to time and to also mark important

    consumer oriented holidays and see if there is an increase in the increase of certain

    types of spam.

    There are actually so many different types of analysis that could be done with such

    a corpus of e-mail, its just mind-boggling.

    Anywayz hope to see those new sets of analysis soon.

    Arash Partow

    __________________________________________________

    Be one who knows what they don’t know,

    Instead of being one who knows not what they don’t know,

    Thinking they know everything about all things.

    http://www.partow.net

  65. datacloud says:

    Raymond Chen at Microsoft has kept every piece of spam and email-based virus he&#8217;s received since April, 1997. Includes interesting charts and analysis. [via Slashdot]…

  66. Jamie says:

    Wow, que interesante.

  67. Dez Blanchfield says:

    Hmm…

    Raymond Chen, you do realise what posting such an interesting read means now don’t you?

    It means that I now have to re-appraise my apinion of MS Staff!

    Ok, so you’re not ALL evil.. there, I said it!

    ;-)

    Dez



    Dez Blanchfield

    Dez <at> WebSearch.COM.AU

  68. Threeboy says:

    I wonder how effective spam is.

  69. Zack says:

    I wonder how effective spam is.

    Apparently extremely effective. Else the spammers wouldn’t keep doing it.

  70. spamblogging says:

    Internet week has an article up on a Microsoft employee that has kept all of his spam over the years. Microsoft employee Raymond Chen has compiled unique evidence of the explosion of spam: he’s saved every spam message and virus-laden…

  71. Dear Sir,

    I am the former minister of finance of Nigeria. Several years ago…..

  72. waltonicia says:

    this isnt slashdot folks… now what we need is a graph of the quality of these posts falling as /.ers rise..

  73. Frances says:

    my favourite email – the no msg and %BR were html links

    From: "%F_N %L_N" <DamahnKinkaid@peoplepc.com>

    To: *@*.com

    Subject: Re: my favorite

    Date: Sun, 29 Aug 2004 17:19:57 +0200

    MIME-Version: 1.0

    Content-Type: text/html;

    charset="iso-8859-1"

    Return-Path: DamahnKinkaid@peoplepc.com

    X-OriginalArrivalTime: 29 Aug 2004 14:09:09.0812 (UTC) FILETIME=[C0F41740:01C48DD1]

    %RND_ITEM_PROP %BR

    %BR %BR no msg

    %RND_PHRASE

  74. So I’m not alone when it started. This matches about when I felt spam increased. And to about the degree.

    Very interesting.

  75. KMT says:

    "This particular email address has been inactive since 1995; all the mail it gets is therefore from harvesting done prior to 1995."

    Not so….that address has been resold, reused and re-harvested on a continuous basis since it was first nabbed. In spammer’s eyes, it is as fresh as any other (validated or not).

    I’d like to see that comment removed, lest readers mistrust the entire page. It speaks to the overall logic applied, and in my opinion, undermines same.

  76. Brian B says:

    I’m currently working as the net admin at a recruiter/headhunter outfit. Our central address for resumes must be on every spam list known to mankind, since 76.4 percent of the aproximately 500 messages received by that address every day are spam. Our ORFEE spam filter only lets an average of 6 to 8 through, so it’s not too bad. Still annoys the recruiters, though.

    Of the others anywhere between 31 to 165 messages per day are viruses. McAfee has only let one through in the six months that I’ve been there. Of course that one went straight to my boss . . .

  77. LatinoPundit says:

    An interesting visual history of spam email done by a Microsoft employee who blogs and has kept a record of…

  78. Matt says:

    Your spam can be considered to be statistical random and therefore although unlikely it is not impossible that a lull of increase could happen for no-reason at all. There is a similar "band" (though less well defined but just pre the year line) on the main chart.

    If you are sufficiently math-nerdy like me look up a Poisson distribution. (http://mathworld.wolfram.com/PoissonDistribution.html) which if I remember my probability lessons is the model used for analysing such data. I can’t remember the method but you can test the anomaly for statistical significance at various confidence levels. (usually 10%, 5% 2.5% levels).

    By the way – great chart.

  79. Raymond Chen says:

    (I never claimed that my chart was statistically significant. Indeed I explicitly disclaimed it!)

  80. Anonymous says:

    Ceklog &raquo; Historia Visual del Spam

  81. Bodhi says:

    AOL Won’t Use Microsoft Anti-Spam Standard

    Reuters – Thu Sep 16, 4:06 PM ET

    Ahhh now I understand why…

    ;’>

    Bodhi

  82. Anonymous says:

    randomwire.com &raquo; Spam

  83. The SpamBorg — Click to enlarge The diligent keeper of Oldnewthing has created A visual history of spam (and virus) email , declaring:"I have kept every single piece of spam and virus email since mid-1997. Occasionally, it comes in handy, for example, to add nave Bayesian spam filter to my custom-written email filter. And occasionally I use it to build a chart of spam and virus email." All we can say is, "Kids, don’t try this at home."…

  84. A Microsoft employee with some extra time and plenty of spam on his hands has decided to graph out all the spam and viruses he’s received since 1997. He’s only graphing out the spam that makes it through the corporate…

  85. genck says:

    there is one solution to spam, we catch them and send them to guantanamo. president how about a war on spam? :P

    or how about spam (the company that makes the product) sues the spammers for the bad name?

    lol keep getting spammed!

    nice work btw Chen :D

  86. Blog says:

    Check out <a href="http://weblogs.asp.net/oldnewthing/archive/2004/09/16/230388.aspx&quot; target="_blank">this chart</a> that plots every single piece of spam and virus email that arrived at this guy’s work email address since April 1997. You can really get a sense of how spam has grown starting in about 2002. Remember when it used to be fairly rare? Neither do I.

  87. Raymond Chen says:

    KMT: I don’t see your point. Sure, the address has been re-sold and re-distributed many times since then, but it was *harvested* in 1995.

  88. BM says:

    As I recall, Jan 2004 was when win98 was supposed to go unsupported so I guess lots of those machines were either traded in or dumped

  89. eCardica.com says:

    Came across your artical in internet week very impress chart. Great idea. Also looks like can-spam is quite things down a bit.

  90. Amuro says:

    Don’t you have anything better to do?

  91. Raymond Chen says:

    Hm, which is sadder: Somebody who doesn’t have anything better to do, or somebody who has nothing better to do than post a comment accusing somebody of having nothing better to do?

  92. Shannon says:

    I’ll agree with Simpleton Jones on the dropoff in Januwary 2004. Maybe you’re only on the list of a few high-volume spammers, and they went on vacation in January after the busy holiday season.

  93. Sharp Tools says:

    An old schoolmate and colleague of mine, Raymond Chen, just put up some nicely interesting data on his weblogs. He apparently carefully saves every piece of spam that makes it through his corporate spamfilters to his account, preserving them much…

  94. jotsheet says:

    Raymond Chen has an interesting visual for us: every spam and every virus he received between April 22, 1997 and September 10, 2004, graphed by time (x) and size (y). It captures the gut feeling I had&mdash;that spam was at its worst in early 2003 when spamming technology outpaced spam filters and the public’s knowledge about prevention. On September 11, 2001, I posted a graphic bearing a faint similarity in concept: the traffic on the Internet as panic set in….

  95. technostan says:

    Random Intel Got Game? Who says video games don’t mean squat in the real world? According to a new book written by John C. Beck and Mitchell Wade, certain gaming traits are valuable to businesses and these skills may change the world of business. From the book’s website: Do any of these Gamer traits sound familiar to you? • If you get there first, you win. • Trial and error is the best strategy, and the fastest way to learn. • Elders and their received wisdom can’t help—they don’t understand even the basics of this new world. • There’s a limited set of tools—but some combination will work. Sounds interesting. I knew those countless hours of Mario would come in handy one day. via Joystiq.com Remembering pi I’ve always been fascinated with people who can memorize a large amount of information. A recent plus article sheds some light into how people pull off these incredible memory feats. I’m sure everyone has their own technique of memorizing things such as word association or making up a silly rhyme. One thing I didn’t know was how smell can be important to memorizing. If you deliberately surround yourself with a particular smell when trying to memorise something, that smell is likely to help trigger the memory later when you need to recall it. The author goes on to say that the part of the brain associated with smell is very close to the part of the brain where long term memories are formed….

  96. Kenneth Mortensen says:

    My worst spam-days were around the blaster worm played around. I recieved approx 12.000 emails pr. hour during a few days. We contacted hosting-companies where the mails seemed to come from and eventually we got rid of it all.

  97. Clovis says:

    impressive data collection, very interesting work

  98. Mat Hall says:

    "Hm, which is sadder: Somebody who doesn’t have anything better to do, or somebody who has nothing better to do than post a comment accusing somebody of having nothing better to do?"

    To commemorate the fact that we’re all nerds, and that the Star Wars box set is out now, I think you should have uttered the immortal line "Who’s the more foolish – the fool, or the fool who follows him?"

    And although you’ve sort of explained why you kept them all (training your Bayesian filter) I would have thought that the arms race between spammers and spamees means that last year’s spam is no use at all in training your filter. Perhaps you were saving them for old age when you may actually NEED "H3rb4l /-i-@-g-r-@!!!11!!1!"? :)

  99. Unknown User says:

    Interesting. I can’t say that I’ve ever really had a problem with spam. In the last four years I’ve recived a total of 3 spam messages, each one surprised the hell out of me. The latest one which I recived in July of 2003 was rather amusing. Something about breast augmentation. (I’m male.) Though I know quite a few people that complain about spam problems. I would never have expected it to be so bad. The graphs provided do however put everything in perspective. Paranoia pays off after all.

  100. KC Lemson says:

    Regarding "You can see that in late 2003, the blue dot density diminished considerably. That’s when mail administrators found a filter whose false-positive rate was low enough to be acceptable." – that’s all thanks to the IMF (http://www.microsoft.com/exchange/downloads/2003/imf/default.asp).

  101. very intresting! good job! compliments!

  102. The blog &ldquo;The Old New Thing&rdquo; has presented stats over all spam and virus e-mails received since 1997. Quite impressive. He must have one heck of a large inbox :)…

  103. DiVERSiONZ says:

    Blogger has kept every single piece of spam and virus email since mid-1997. Why? You ask. He has his reasons

  104. Don says:

    Interesting. My seldom used work account reallys started to pick up spam inearly 2003, about 6 months after it went more or less inactive, and still gets nearly 90 per day, although is has started to drop a bit. My guess is someone got a copy of the company mail account list. Another company account (different URL) get none, probably due to aggressive filtering. And a homea ccount gets perhaps 15 a day now, up from almost non 6 months ago.

    The lull in early 2004 is similar to a somewhat less obvious lull in early 2003. My guess is that the folks buying SPAM services have discoverd that people do not buy much in the month or two after Christmas, so why spend money advertizing via spam that provides minimal return – just spend the winter months in the tropics.

  105. Reader says:

    Work: Never had any spam. Had a few viruses.

    Home: Never had any spam. Never had any viruses.

    Girl friend gets spamed for college/financial aid bs. She has @comcast.net.

    I use a private mail server.

    We really dont have any extra firewalls spamblockers or virus detectors. Although I see tons of this type of software on every windows box I’ve ever sat down in front of.

    May be I’m just not trusting enough. May be I’ve some intuitive wisdom. May be I’m just repulsive. May be I’ve got a Mac.

  106. Wm. Nakia says:

    Its possible that the quiet spot was when we had that worm attack back in jan/feb… I remember there was a panic among a number of big mail servers. I think yahoo automaticly banned anything with a title of "hello". And hotmail was discarding large numbers of leitamit mail in February (they were more than happy with the false-positives being quietly discarded).

  107. Anonymous says:

    Bird’s Eye View Blog

  108. a_scratch says:

    From those chart i can see that there’s more virus and spam during summertime. I m sure that more youngs geeks spare all summer writing viruses! In my time we were playing outside.

    Damn I hate those youngsters! But jesus they re wise!

    hehe

  109. Romain says:

    Very impressive and interesting data, we can see clearly the sobig attack and the other events.

  110. Da BArt says:

    What i find interesting is the fact that… Does ANYONE ACTUALLY REACT POSITIVELY TO SPAM? IS IT A MARKETING TOOL FEASIBLE ENOUGH TO JUSTIFY SUCH OVERLOAD? I think not. Spammers obviously work under the deluded notion that a 1% successful reaction will make them money. If they tightened up their systems by hiring enlightened consultants (like me) they’d probably be millionaires by now. :-)

  111. Raymond Chen has been collecting e-mail for a long time. Microsoft must not be working him hard enough because he was able to create a pretty spectacular graph&nbsp;that maps trends in e-mail SPAM and viruses.Watch as he takes you through the finer features of the map. Very cool….

  112. Phil Renouf says:

    You got linked to on Boing Boing:

    http://www.boingboing.net/2004/09/22/a_visual_history_of_.html

    Someone from the Spam Weblog pointed it out to them, who knew there was a spam weblog? ;)

    Spam weblog: http://spam.weblogsinc.com/

  113. Leo says:

    It would be interesting to compare this analysis to one of your legit emails. If you have an archive of these they would also make your Baysian classification of email into span and non-spam bins better.

  114. Ignacio says:

    Hi! Sorry, but coulnd’t be your inactive account gets trapped by techniques (like simply trying letter combinations) that don’t actually require to read it somewhere?

    My gmail account got spammed the minute I got it and nobody had it!

  115. A visual history of spam (and virus) email This guy has saved all of his spam and viruses delivered by email. He has also plotted all of them on a chart so you can see the distribution of spam over…

  116. chris says:

    What i find interesting is the fact that… Does ANYONE ACTUALLY REACT POSITIVELY TO SPAM? IS IT A MARKETING TOOL FEASIBLE ENOUGH TO JUSTIFY SUCH OVERLOAD? I think not. Spammers obviously work under the deluded notion that a 1% successful reaction will make them money. If they tightened up their systems by hiring enlightened consultants (like me) they’d probably be millionaires by now. :-)

    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    It is amazing. I sent about 30k spams late last year targeting salespeople – we were selling motivational tapes. We made roughly one sale per 1500 emails. Granted, we went after real estate and insurance agents, but still… I stopped though – bad karma..

  117. /dev/random says:

    A guy from Microsoft has made a cool drawing: Since 1997, he kept all spam/virus received and was able to…

  118. RaTTy says:

    All I can say is with SPAM, they insult me. I got one the other day that offered to increase my ‘little mate’ and I am insulted!! WHY WOULD I ‘DECREASE’ the size of my ‘little mate’ to a mere 12 – 16 inches????

    :-) Another funny brought to you by someone who cares less, and loves (IT) more.

  119. A visual history of spam (and virus) email. Edward Tufte would be proud.

  120. Soonts says:

    Hmm… Very interesting data.

    However, don’t you noticed that spam’s moving from e-mails to search engines, instant messengerz and GSM phones (SMS)? :-(

    P.S. The article was translated in Russian & linked by xakep.ru: http://www.xakep.ru/post/23943/default.asp (sorry, Russian only)

  121. A visual history of spam. Vore intressant att se den hr mngden mailtexter (en korpus s god som ngon) underskas…

  122. Sabin says:

    Very cool. I like it.

  123. El Kel says:

    Fascinating! Well done on collating this data – although it looks scary when represented on a graph like that!

    Re spam messages in general: I love it when spammers think I need help to increase the size of my "little mate" (great euphemism!) – no thanks, being female I don’t really have the equipment to take them up on the offer :-)

    Weirdest spams I’ve ever had: people trying to sell me septic tanks. Heaven knows why. . .

  124. Jason says:

    UNSUBSCRIBE

  125. default. says:

    why are so many comments, copied and pasted from various articles, describing your work ?

    could almost say, your getting spammed…

  126. Norman Diamond says:

    9/23/2004 4:39 PM Soonts

    > However, don’t you noticed that spam’s

    > moving from e-mails to search engines,

    > instant messengerz and GSM phones (SMS)? :-(

    Well, cell phones have been getting e-mail spams for more than 5 years now. A few years ago it got to the point where phone companies had to let recipients preview somewhere around the first 100 characters of an incoming message for free before deciding whether to download the entire message.

    Among all the methods of spamming by which spammers make recipients pay the costs of delivery, I can only think of two which aren’t used. One is physical postage-due postal mail. The other is collect calls. Anyone know why these two kinds of spams don’t exist?

  127. Tolik says:

    Вот так подумать: ничего себе спамеры хуярят.

  128. MahiX2 says:

    I thought I had it bad when it came to spam. I actively send/receive e-mail through 8 e-mail addresses, and on a daily basis is it pretty common to have somewhere in the neighborhood of 125 spam messages per day…

  129. Walter says:

    Norman Diamond asks, "Anyone know why these two kinds of spams don’t exist?"

    Because the Postal Service stopped delivering postage-due mail and recipients have an opportunity to decline collect calls. (For the moment…)

  130. Russell says:

    And it used to be that spam said something, whether it was some Nigerian fraudster or I could have a 4 metre penis or whatever ~ I’ve noticed new emails that are just random combos of words. So what the heck’s the point ?? Dumb

  131. Don says:

    Random combos of words – maybe to get search engine hits?

  132. Not-the-same-Don says:

    I agree with abb3w. People get new computers at Xmas with free trial period for Norton. They figure out Norton is making the whole computer slower than molasses and give up on virus protection entirely.

  133. Plotting spam. In a graph way, rather than a muha-ha-ha way God vs Bush, the evidence at last ;-) Absolutely hilarious account of a gay guy on the subway being preached at and deciding to sing showtunes at the…

  134. James says:

    People Pc has a spam blocker…I think.

  135. Andrew says:

    Fascinating … well done Raymond

  136. Anonymous says:

    aquafusion – home

  137. Anonymous says:

    circle.ch / moblog

  138. Anonymous says:

    Ryanware Blog &raquo; I guess I should keep spam after all

  139. Anonymous says:

    XTremeBlog &raquo; Graphical Spam

  140. A visual history of spam (and virus) email charts spam (and virus laden e-mail) since 1997 when the author started collecting all spam (now that takes dedication!) As his/her graph shows spam took off in 2002 – I wonder what…

  141. A visual history of spam (and virus) email and MBA funs!"> MBA funs!"> MBA funs!"> MBA funs!"> MBA funs!"> MBA funs!"> MBA funs!"> MBA .

    Interting !~

  142. Anonymous says:

    j. scott kosoy &raquo; Blog Archive &raquo; Popular

  143. Anonymous says:

    Victor Boctor’s Blog – A visual history of spam from 1997 to 2004

Comments are closed.