How virus researchers work

I haven't had the chance to work on virus analysis.  Spam analysis has a lot of heuristic tricks of the trade because language is so fluid.  While 90% of spam can be caught with IP reputation and another 5% with URL reputation, there's some spam that requires the elegance that only a content filter can provide.  Writing spam rules is a bit of a black art, though there are definitely guiding principles.

Viruses, on the other hand, are a different thing altogether.  Last week I had the chance to talk about the skills required to deconstruct viruses.  Whereas with spam, the spam analyst only need read the context of the message and do a quick interpretation.  With viruses, it requires a different skill set.

To begin with, virus researchers are in declining numbers.  More and more tools these days make use of high level programming languages - Visual Basic, C#, Object Pascal, and so forth.  These types of tools and languages are what is normally taught in university and so most new computer science grads are well versed in these skills sets.  However, for viruses, you need to get down to a much more granular level.  You need to get so granular, in fact, that you need to understand assembly language.

Who understands assembly language?  I sure don't.  Or rather, I understand it a little bit but I find it very verbose and incredibly difficult to read.  To that extent, in real terms I would say I don't understand it enough to string together a sentence.  Viruses operate at very low levels.  Because they are captured as binaries, in order to deconstruct them you must watch the low level data interaction and commands that they execute.  They are not in the English language like spam is, they are instruction commands.  Decompiling them is a real talent.

To that end, in order to get good at writing signatures for viruses, you need to get good at assembly language.  This takes a lot of time because the whole reason the industry has moved away from assembly is because it is much easier to write code in high-level languages.  So, virus researchers need to be trained to read assembly, and then comes the next step: learning to recognize common virus patterns in the code/commands that they see.