Don’t trust that data!

A while ago I wrote a couple of blog entries on code repurposing and some mitigations, and one of the main causes of that problem is that developers inherently trust data. The text box caption says Name, so it's always gonna contain the user's name, right? Nobody is ever going to put a SQL query or a JScript statement in that field... are they?

But I want to talk a bit today about users inherently trusting data, and how it's just as bad. I'll eventually talk about Excel and some of the cool new stuff we're doing with VSTO 2.0, but let's start a bit smaller than that.

And a bit less techy.

Imagine you're a caveman (or woman) and you've just discovered how to cultivate crops. You have two fields of corn ready for the winter, and one day your neighbouring caveman comes by and says "Ugh, me see sabre-toothed tiger nearby. Ugh. You hide in cave, ugh, and me go fight tiger. Ugh-ugh?" Being very afraid of sabre-toothed tigers, you are more than happy to let your neighbour go off and fight it while you roll a big rock over the entrance to your cave and hide away in fear. The next day you roll back the stone and emerge, only to find that all your crops have been pillaged and your neighbour is no-where in site. Congratulations! You just fell for the world's first matchstick man!

Fast-forward to the (very) late the 20th century. You're sitting at work looking through your e-mail, when the subject line "HOT STOCK TIP!" catches your eye. You've always wanted to make a killing on the stock market (and quit your crappy job) so this could be the answer to your prayers! You read the message and jump onto your E*Trade account to buy up as much of the stock as you can. Three days later the stock is unlisted and you're left penniless on the street. Oops!

There could be an infinite number of examples here -- the basic problem is that you are acting on information furnished to you by people whom you should not trust. I gave these examples (and will give a few more) mostly to placate any fears that what I am about to describe is a new problem that we are introducing with VSTO 2. Nothing could be further from the truth; it is just that VSTO 2 will provide many amazingly cool features that, like all features, can be used for good as well as for evil.

You have been warned 🙂

Let's talk about using Excel to implement an expense report (a common scenario we use here at Microsoft) and how that can be abused by untrustworthy employees. We'll start off with an expense report that doesn't use any kind of code (VSTO, VBA, whatever) although it does use formulas. The expense reporting process is as follows:

1.       The employee fills out an expense report and e-mails it to their manager

2.       The manager approves the expense report and e-mails it to the payroll department (obviously they could also choose to reject it)

3.       The payroll department receives the report and reimburses the employee with their next pay cheque

Note that in these scenarios I will only focus on the employee trying to abuse the system to get more money than they should; in a real system you would also think about all the players that could be trying to abuse the system -- the manager, the payroll employees, the vendor who wrote the payroll system, etc. -- and all the things they might try to do -- steal money, block legitimate payments, and so on.

The problem in this case is that the manager accepts an expense report from the employee and approves it or denies it simply by looking at the values in the cells. But those cells were under the control of the attacker (the employee)! Let's say the employee recently took a client out to lunch and is claiming a fairly reasonable $100 for it. This sounds good to the manager, so she approves the request and sends it on to the payroll team. Unbeknownst to her, the $100 in the expense report was not a static value added by the employee, but rather a formula that would change to the fraudulent amount of $1,000 when viewed by the payroll employees. The payroll guys see the value for $1,000, verify that it was approved by the manager, and promptly over-pay the employee to the tune of $900.

Note that using digital signatures or other "security" technologies wouldn't have helped here; the manager would just have signed the spreadsheet containing the formula. In fact it may have increased her liability because she can no longer claim that someone spoofed her e-mail account and sent the dummy report -- after all, it was signed with her private key!

About the only thing the manager could do in this case (short of performing a full audit on the spreadsheet) would be to take a new, known-trustworthy expense report template and manually re-key the employee's data into the spreadsheet so as to ensure no trickery was underway. This of course is a colossal waste of time, so nobody is going to do it. Of course they could also copy the entire spreadsheet and paste it back on top of itself with the "values only" option, but then it might break other parts of the spreadsheet (like the =SUM() field at the bottom). Basically, it's a big hassle.

(Oh, and in case you think this is a problem with using Excel and it's auto-magic formulas, imagine that the expense report is a plain text file written in Notepad. The manager gets an expense report from her employee and sees the single line item "Lunch with client: $100" and sends it on to payroll. Unbeknownst to her, the employee simply added fifty blank lines after the first item and added "Ticket to the Caribbean: $5,000" to the end, knowing full-well that the payroll system will not be fooled by blank lines and will pay out for both line items).

The reason I brought up VSTO 2 earlier on was that Eric and Eric (and the rest of the team) have been making data-binding and data-centric programming and server-side access to data so easy and powerful in Excel that I fear people will throw themselves into this cool new technology head-first and never stop to realise all the horribly bad assumptions they are making. Does the cached data your server-side component "see" have anything to do with the spreadsheet itself, or did the user hack it with a binary editor before e-mailing it to you? Does the TotalAmount named range still refer to $C$10, or did some nefarious employee move it to point to $D$20 instead? Has the user filled the "real" worksheet with bad data, hidden it, and then replaced it with a spoofed (look-alike) worksheet with benign data intended to fool other users? Did the user open your spreadsheet without the managed code executing, thereby bypassing any client-side validation functions you used to vet data before submitting it to a server system?

The solution to the problem is, of course, to ensure that the only thing the employee is in control of is the data, not the way it is presented or the behaviour of the program. And thankfully the great work being done on VSTO 2 helps you out here; you just have to know how to use the tools effectively. There are two fairly obvious solutions to this problem of the employee being in control of the cells in the spreadsheet:

1.       Utilise a trusted third party (often a server) to perform the "copy and paste" operation noted above

2.       Utilise a trusted UI (not under the control of the attacker) for displaying and confirming the values in the spreadsheet

You can probably think of other ways, too. (Note that here we assume the attacker does not have any control over the code you are executing on your machine; they only have control over the spreadsheet).

The first solution is (to me) the coolest, and it uses the VSTO 2 technology quite well. Instead of the 3-step process above, we build a more complicated (but less prone to abuse) process that uses a web site to help "cleanse" data:

1.       The employee fills out an expense report and submits it to the server

2.       The server strips out the data from the expense report, stores it in a database, and sends a notification to the manager

3.       The manager clicks on a link to the server, which extracts the data from the database, shoves it into a brand new expense report template, and serves it up to the manager

4.       The manager approves the expense report and submits it back to the server

5.       Repeat steps 2-4 for the payroll people

In this scenario, the manager (and the payroll people) are guaranteed to see exactly the same data that the back-end processing system will see, because the Excel spreadsheet (which in the past may have held nasty formulas, hidden sheets, re-directed named ranges, etc) is never propagated from one user to another. The employee can dork with the expense report all they like, but they will not be able to get away with the same attack; when they submit the report to the server, they will get an error if they have placed formulas where numbers are supposed to be, and no matter how much they try and spoof the UI of the spreadsheet to make it look like it is for $100 when it is really for $1,000, the manager will see the true value of $1,000 and not approve the report (and hopefully fire the employee). You might realise that this is the way most web sites work, and you'd be perfectly correct; we're simply using the power of the Excel client to make the data entry and data viewing experiences better. It breaks down if you don't have a trusted server, or you need off-line support, or if for whatever reason your current process inherently relies on people e-mailing stuff to each other.

The second solution can help here. At its heart, this solution leverages the rich Excel user interface for the data entry portion (the employee), but completely bypasses it for the validation / approval portion (the manager / payroll clerk). The process is modified thusly:

1.       The employee fills out an expense report and e-mails it to their manager

2.       The manager opens the expense report, reviews the data inside a custom-built dialog box, approves the expense report and e-mails it to the payroll department (obviously they could also choose to reject it)

3.       The payroll department receives the report and reimburses the employee with their next pay cheque

In this scenario, when the manager indicates their desire to approve the expense (by clicking a button, etc.) the solution gathers up the data from the spreadsheet (the same way that the server would do in the previous example) and shows it to the manager in a "trusted" user interface such as a data grid inside a modal dialog, or (dare I say it?) inside the "Document Actions" pane. The manager then ignores what is in the Excel cells and makes their decision based on the numbers inside the trusted UI. (They will most likely look at the original expense report anyway, just to see what the expenses were for, but they need to make their decision on the value shown in the dialog, not the one shown on the spreadsheet). Just to make doubly-sure there is no deception going on, you could require the manager manually insert the total amount into a "Verify amount" field on the spreadsheet before submitting it.

Update 12-04-04

Having a policy such as "No direct manager can approve expenses over $500" would also help here, because even though the manager would see an expense for $100, the system would see $1,000 and flag it as a policy violation. Now of course the manager would then complain about the stupid computer system messing up again, but hopefully someone would track down the discrepancy, ferret out the fraudulent employee, and fix the system so that the same kind of thing didn't happen again.

What this shows is that technology is not a panacea to solving security issues. Technology has no concept of morality and can be used for good as well as for evil. Having solid designs for your solutions and building quality threat model for them will help you way more than throwing random technology buzzwords at a solution. User education and having good policies & procedures goes a long way, too.

Oh, and hiring trustworthy employees (and keeping them trustworthy by treating them well) is also incredibly important.

Comments (2)

  1. Eric Carter says:

    It seems to me that the attack doesn’t even have to be even so sophisticated as to involve formulas and certainly doesn’t need to involve code.

    If some script on the server is grabbing data out of the spreadsheet, it is going to want to look in a particular location of the spreadsheet to get the data.

    There are two ways I know of for the script to grab this data out of the spreadsheet. First, you create a named range–for example "ExpenseAmount" maps to Sheet1!$A$5.. Second, you do it based on absolute adress, just Sheet1!$A$5.

    Either of these can be subverted. For a named range, users can just edit the name of the range to point to something else–say Sheet3!$ZZ$3000. Then they put into the Sheet1!$A$5 the reasonable expense and into Sheet3!$ZZ$3000 the unreasonable one that no one–except the script on the server–will look at.

    If the script is grabbing data out of the spreadsheet based on an absolute location, the attacker just has to hide column A and replace it with a column B. With the unreasonable number in the hidden column A and the reasonable number in the unhidden column B, once again the same attack is enabled without any real fanciness. If you want to get fancy, you can hide the row and column headers in Excel so you wouldn’t be able to even detect this attack.

    It seems like launching an attack via cached data is the hardest of all possible attacks because you actually have to write a bunch of code to do so.

    Also, I wonder if document protection in Excel/Word could be used to provide a secure UI. It sure seems unfortunate to have to display a WinForm to show your Excel data.

Skip to main content