Practical Windows Sandboxing – Part 1

I've written more than once about how interesting restricted tokens are – the earliest article was on Mark Edward's Windows Security web site. Unless it's been taken down recently, the article and source code are still there. In the nearly 8 years since then, I've talked about them in "Writing Secure Code", and Michael Howard has made it into a rudimentary tool to launch apps with less privileges.

A bit of history – restricted tokens were part of a somewhat failed sandboxing effort named SAFER that shipped with Windows 2000. Up until fairly recently, we haven't used them in many places – one of them is to keep the logon screen out of mischief. Restricted tokens are the core of the sandbox we shipped with MOICE, and as part of that effort, we got some lessons from the school of hard knocks, and I've learned a lot about some of the nuances of how you might use this technology. I'm documenting our techniques for a lot of reasons – first, security by obscurity isn't security. I'm quite sure there are hackers (um, security 'researchers' – whatever) out there who know as much as I do about the sandbox in MOICE, which leads me to the second reason – I'd like it if people building on Windows knew how to correctly use this technology to protect their own apps where it makes sense.

A serious caveat – one of the reasons SAFER was considered a failure is that it tried to sandbox general purpose apps on the same desktop as everything else. This isn't going to work. It can make a somewhat reasonable speedbump that will trip up malware not expecting to be run as something other than admin, but I'm interested in a serious security boundary, not a speedbump. What this implies is that an app really needs to be written with the constraints of the sandbox kept in mind.

Let's take a look at the actual API –

BOOL
CreateRestrictedToken(

HANDLE
ExistingTokenHandle ,

DWORD
Flags ,

DWORD
DisableSidCount ,

PSID_AND_ATTRIBUTES
SidsToDisable ,

DWORD
DeletePrivilegeCount ,

PLUID_AND_ATTRIBUTES
PrivilegesToDelete ,

DWORD
RestrictedSidCount ,

PSID_AND_ATTRIBUTES
SidsToRestrict ,

PHANDLE
NewTokenHandle

);

Now let's look at each parameter, and I'll show you how we used each one:

  • ExistingTokenHandle – you just get this from OpenProcessToken, or LogonUser. It needs to be a primary token, not an impersonation token. If you have an impersonation token, you need to use DuplicateTokenEx to get a primary token.
  • Flags – If you simply want to drop all the privileges, use DISABLE_MAX_PRIVILEGE. SANDBOX_INERT is complicated, and I might discuss it later. It isn't important for what we're doing. Note that even if you try to drop all the privileges, it won't drop "Bypass Traverse Checking". IMHO, this really ought to be documented, but it's really to save you from yourself – huge numbers of API calls break if you don't have this privilege.
  • SidsToDisable – these are the groups that we want to set to deny only. Even though we're going to use SidsToRestrict later on, it's still good to use this. I believe this whole function is backwards – we should be specifying the groups we want to keep, NOT the groups we want to drop. How you want to use this is call GetTokenInformation with TokenGroups as the level, and then pass in a list of SIDs you want to keep. Anything not in your list gets copied into the output list for removal. There isn't much you want to keep – you need to keep the logon ID SID, everyone, and BUILTIN\Users. Don't keep authenticated users. Doing this efficiently is tricky, and is best done by digging through the guts of the SIDs in the list. I'll follow up on what the logon ID SID is, how to detect one, and what it's used for.
  • PrivilegesToDelete – the simplest thing to do is to disable everything you can using the flag, but if for some reason you need some specific privilege for your app, you have to do the same thing as with the groups – get a list of the token privileges using GetTokenInformation, copy over the privileges that aren't in the list of privs you want to keep, and then pass that in here. The point here is that we keep inventing new and sometimes risky privileges, and you don't want your nice restricted process suddenly showing up with a bunch of new privileges because someone upgraded from Win2k3 to LH Server, or XP to Vista. I think the API should have been PrivilegesToAllow, but I obviously didn't write it.
  • SidsToRestrict – these are interesting. If you have ANY SIDs in here, one of them MUST be "restricted", or nothing works. This is unfortunately not in the documentation, and I had to figure it out by experimenting. You can now add in any of the groups you want to keep, and you can actually make up completely different SIDs that don't map to any user and add those if you like. This will be important later on. In our implementation, we made this list everyone, users, restricted and the logon ID SID.

I found the SidsToRestrict one of the more confusing aspects of this API, but let's look at what happens if you don't use this parameter. You'll end up with a token that has the token user enabled, and whatever groups you decided to keep. Because the token user is still enabled, the process has enough rights to open a non-restricted process (or edit your profile and all your stuff), which then allows it to escalate privilege back to the original token, and all we've accomplished is a speed bump. The only way to get the original user out of the equation is by using the SidsToRestrict.

If you have SidsToRestrict, access checks all get to be a 2 pass test. The first pass is against the groups that remain enabled (and the token user). The second is against SidsToRestrict. This allows a restricted process read access to a portion of your user profile, since the ACL looks like admins:F, you:F, restricted:R – the first pass allows full control because you have access to your profile, and the 2nd pass grants read because restricted has to be there. The net permissions are the intersection of the 2 sets, which amounts to read. If a group shows up in both lists, obviously access will be granted to that group.

If you now go and create process using this token, you'll get a weird error. The problem is that the token's default DACL that I was explaining yesterday will create a process DACL that this process token doesn't grant access to, and a process that doesn't have access to itself dies a quick and horrible death, spewing an undocumented error – though CreateProcessAsuser will succeed – and it will die early enough in initialization that I was never able to debug into it. Ouch. I think we ought to mention this in the SDK.

So our next task is to define a proper default DACL for the token and call SetTokenInformation to apply a correct DACL. What I do is to make the ACL system, admins, logon ID allow full control. Once this bit of extra work has been done, you can now happily create a process that doesn't have access to much of anything – it can't access your stuff, and there won't be many places on the file system it can write to. Due to leaving users enabled, it can execute installed programs.

The hackers out there are now thinking – but there's a bunch of attack surface still available – clipboard, global atoms, and window message attacks. There is that attack surface out there, but we can deal with that – stay tuned – more blog posts to follow.