Spam filtering and skill sets

When filtering spam from a client base that is world wide, you tend to pick up a skill set that you might not otherwise get a chance to obtain - learning foreign languages.

Now, I'm already fluent in six million forms of communication, but surprisingly there are a lot of common languages that evade me.  While filtering spam will never make me fluent in any language, I have discovered that given only a few words or sentences (sentences are way easier), I can often tell what language an email message is despite not actually being able to speak the language.

Well, fast forward to the last couple of days.  I decided I was going to update the sensitive word list in one of the languages that we support - German.  These are a group of words that customers can optionally enable.  If the custom spam filter option flags the word, it automatically gets junked.  By a series of events, the task of updating the list fell to me.  So, I went and did some research (don't ask me for details) and came up with a list of over 400 candidates. 

I had to whittle that down by getting rid of duplicates as well as the ones that simply wouldn't work.  By way of example, in English, the word slut has a negative connotation, if you speak English then you know what it means.   However, in Swedish, it means "ends, end, or finish."  Even if you obfuscate the word to look for instances of something like s'lut (as spammers often do), blocking on that doesn't work either. In French, the word salut means hi .  However, sometimes French speakers abbreviate that to s'lut .  So, the ability to think laterally, linguistically, is a real advantage.

I had learned those two pieces of trivia some time ago.  These past couple of days I learned some more.  As I was paring down the list of words we couldn't use, I started to learn them.  I had them all in an Excel spreadsheet with the English translation next to the German one.  Some German terms had multiple translations so I got rid of some of them by right-clicking, selecting Delete and picking the "Shift cells up" option.  This moves all of the cells up by one, deleting the current cell.  However, after doing this a few times, I started to look at the translation of sensitive words.

"Wait a second," I said.  "This German word does not mean what the English translation says..."  It turns out I should have been doing some other copying-and-pasting and deleting the row, rather than deleting the cell.  However, the point is that I actually learned a bunch of words in German that 48 hours earlier I did not know.  I was actually reading the German word, translating it in my head and then confirming that the English translation was incorrect. 

You see, when you're exposed to a lot of spam and have to work with foreign languages, it's actually quite amazing at how quickly you can start to pick up bits and pieces of that language.  It's a skill set that comes in handy when you want to travel abroad.

Now, I'm not sure that the vocabulary that I acquired is going to be very useful, but the point remains - fighting spam does enable you to pick up the oddest skill sets.

