Follow up to “Don’t trust that data”

Eric makes some good points in a comment to my last post. Nevertheless, the forces of evil within me compel me to respond anyway. (You should have blogged it, Eric 😉 ).

Eric's main point is that the employee doesn't need to use formulas in order to fool the expense report system -- he can simply redirect the TotalExpense named range to point to some arbitrary location that his boss will never look at. That would be correct in an automated system, but the supposition in the first example was that there was no code involved in the scenario; it was all based on people looking at the expense report and following a manual process. Hiding the column or re-directing the named range doesn't make much sense, because the payroll clerk will see the same column that the manager sees ($100).

Hiding / moving a named range (or any other kind of UI spoofing attack) will typically only work when a human makes a decision that a computer then acts upon (because the computer "sees" a different value to the human). You must understand your threats (or your opportunities... mwhahahahahahaaaa) in order to successfully protect (or exploit) a system.

Eric also points out that hacking the cached data blob is probably the hardest attack of all to mount, but that just means developers will be least likely to deal with it! Nobody expects the Spanish Inquisition! If I know you are passing the data to some unmanaged component, for example, maybe I can trigger a buffer overflow by fiddling with the bits. Or perhaps I can just break some of your other assumptions in the code by inserting too many (or too few) rows of data, etc. I just don't want developers to fall into the trap of believing (incorrectly) that the data cache always holds what they are expecting it to hold. We've seen far too many web developers fall into that trap and get themselves (and their customers) into all manner of nasty problems.

You cannot trust anything that was under the control of the attacker.

Using protected documents might help somewhat against a causal attacker, but you need a whole lot of infrastructure to set up IRM, and the other kinds of protection are trivially broken. Also it should be noted that IRM is not a security technology! It is not a foolproof way of thwarting all attacks by well-skilled evil doers; it is a technological measure to encourage users to adhere to existing corporate policies (such as "don't forward confidential e-mails outside the company").

Obviously these kinds of threats will not exist in the vast majority of cases -- most employees are not going to spend time hacking into your Excel based solutions in order to cheat on their expense reports; they're just going to try and get their jobs done. But you should be aware of such possibilities so that you can weigh up the costs of adding in additional protection (in terms of increased development time, reduced productivity / usability, more help desk calls, etc.) against the risks / likelihood of employees rorting the system (if you are a large bank or a secret government agency, the risks might be pretty high).

Oh and this is nothing unique to Office -- if you built a custom WinForm application (or even a Java application!) and used it to connect to a server, I would be giving you the same advice; you would be asking for serious trouble if you blindly accepted all data coming from those clients and acted upon it without first doing some kind of verification. Just as the employee can dork with the spreadsheet in order to send you fudged data, so too could they dork with the client application (or just write their own!) and use it to connect to your server and send you bogus data.

Comments (1)

  1. Jerry Pisk says:

    I’ve seen far too many web developers say "We have a JavaScript that makes sure no invalid input is sent to the server, therefore we don’t need server side validation." Unfortunatelly I have to work with some of those. Anybody hiring?

Skip to main content