The other day, one of our architects was tinkering around and discovered that approximately 40% of the total inbound connections to our network were connecting to us via TLS. This seemed to be a rather high number, so that spurred an investigation.
If you are unfamiliar with TLS (as I am), it is a protocol for authenticating servers and clients and using that authentication to encrypt the communications between those two parties.
From Technet: In the authentication process, a TLS/SSL client sends a message to a TLS/SSL server, and the server responds with the information that the server needs to authenticate itself. The client and server perform an additional exchange of session keys, and the authentication dialog ends. When authentication is completed, SSL-secured communication can begin between the server and the client using the symmetric encryption keys that are established during the authentication process.
For servers to authenticate to clients, TLS/SSL does not require server keys to be stored on domain controllers or in a database. Clients confirm the validity of a server’s credentials with a trusted root certification authority’s (CA’s) certificates. Therefore, unless user authentication is required by the server, users do not need to establish accounts before they create a secure connection with a server.
In public key cryptography – indeed, in all encryption – it is an expensive operation. It takes time to grab the key, verify it, encrypt the message/channel, and then decrypt/validate the message/channel on the receiver’s side. Typically, you would use it when you want to have a secure connection such as e-commerce, remote access to a machine (to prevent man-in-the-middle attacks), or validation of certain email transactions. Thus, it serves a legitimate purpose. The idea is that because TLS is an expensive operation, spammers would shy away from it. They need to send as much spam as possible and because TLS slows them down, this is not the best option for them to use.
In my department, we filter 97% of our mail as spam, and 90% of that is done before it gets past our RBLs (ie, 90% of our mail is rejected due to IP blocklists). So, if TLS is used only by legitimate mail servers, then it would mean that 3% of mail is responsible for 40% of mail that connects to us via TLS. This is phrased really awkwardly but at the moment I cannot find the words to capture what I mean. 40% of our mail was connecting via TLS, but only 3% of the mail is actually good. It is virtually impossible for that fraction of our mail to account for all of those TLS transactions. Translation: we are getting a lot of abusive mail that is connecting to us via TLS.
I decided to launch an investigation. We suspected that one or more botnets had shifted its behavior because we hadn’t seen this last year in 2009. I started to track how much of our post-RBL mail was sent via TLS and how much was marked as spam for the past couple of days.
The results confirmed my suspicions (values below are normalized, not actual values):
Total amount of mail sent over TLS: 50 million
Total amount of mail sent over TLS marked as spam: 19.5 million
% of mail sent over TLS marked as spam: 39%
Total amount of mail sent (TLS + non-TLS): 139 million
Total amount of mail marked as spam: 44 million
% of mail marked as spam: 32%
From this, you can see that mail that gets past our RBLs that is sent via TLS is 7% more likely to be marked as spam than mail that is not sent via TLS. This surprised me from a botnet spamming behavior point of view. This means that they incur more overhead into an SMTP transaction and it takes them longer to do it. It is less efficient for them. However, from a numbers point of view this did not surprise me. If the ratio of TLS to good mail is 40/3 = 13:1, then most of the post-RBL TLS mail is going to be spam.
I next decided to capture the IPs that were spamming and see which botnets they belonged to. I had a suspect in mind but I needed to prove it. I trolled through a sample of our logs finding all of the IPs that were sending spam via TLS (ie, post-RBL), and also looked at the IPs pre-RBL that were sending via TLS. I then cross-referenced them with my script that maps botnets to IPs. The result?
Rustock is doing it.
Out of all the botnets that I can identify that send us spam, post-RBL, rustock accounts for 34% of it. Yet, it accounts for 79% pre-RBL spam over TLS and 84% post-RBL spam over TLS. The next nearest botnet is cutwail and it only accounts for 1% of spammy TLS connections.
Rustock was my suspect, and this confirmed it. It kind of makes sense for rustock, based upon what I know of it:
- It’s the biggest botnet that spams us, by far.
- It sends from a very wide range of IPs and only sends, on average, 1 message per envelope.
- It is known to have “sleepy behavior” – it wakes up, spams, goes to sleep… wakes up, spams, goes to sleep. The pattern that we observe concurs with this, but the research I have found indicates that its behavior are for much longer spam periods (8 hours) rather than the 10 minutes or less that we see.
If a spamming engine could afford to send spam over TLS, rustock is the one to do it. Because it sends from such a large array of IPs, and sends out so much spam, it can absorb the additional TLS overhead required to start an SMTP conversation. This behavior in rustock is a clever move.