I ran into an interesting situation with a customer’s application the other day. You would try to install the application, and it would prompt for a reboot. OK, so it couldn’t grab that file handle yet, I’ll go ahead and let it. After a reboot, the installation would continue, and then prompt for a reboot again. And this cycle would pretty much continue indefinitely. What was going on?
Since the installation was an MSI, getting to the bottom of this one was a bit easier. Just pop open the Orca SDK tool, and take a peek. In an MSI database, you have a FeatureComponent table. This links to the Component table in the MSI. So, one by one, you can just see which components it is trying to install. And, since it was failing, you can always use the trusty EventLog, which conveniently provides the ComponentId which it was unable to write.
And here is where I found the culprit. But first, some background…
In Windows 2000 and Windows XP, we wanted to protect critical Windows sytem files. However, if we locked them down, then a lot of existing software would break, since the installers would try to install Windows components. You know, just in case they weren’t there. So we had to be more clever about it. What we ended up doing was implementing a detection mechanism. We would detect that you modified one of the Windows system files, allow you to do this, and then come along later and just put the original copy back.
This was imperfect for a couple of reasons.
First, the copy was not immediate. I know the pain of this one first hand. I had an application that believed that not only did it need to install critical Windows system files, but it needed to install pretty much all of them. No problem, the original ones were dropped back in place and everything was fine. Then, I went to uninstall the application. It assumed that, since it dropped those critical Windows system files, it would be polite enough to go ahead and remove them for me. Just to be thorough. The problem was, it prompted for a reboot immediately afterwards. And, inconveniently, before the system file checker had a chance to lay them back down for me. So, upon reboot, Windows was unable to start because those files were never replaced. Yuck.
The other reason is that you could always get around this. We implemented this by laying down two copies of the system files. One in the normal location, and the other in the dllcache directory. Which means you could circumvent the protection if you were smart about it. If you lay down your copy in the dllcache directory first, and then lay down in the original location, then the system file checker would replace the copy in the original location with the copy in dllcache, which would be your modified copy.
And that is exactly what this application was trying to do. It wanted to replace the keyboard handlers from Windows with its own, so it was laying them down in dllcache first, and then laying them down in the original location. This little trick used to work on Windows 2000 and XP.
It doesn’t work so great any more.
Because our previous method of approaching the problem was imperfect, we came up with a better one. We now ACL critical Windows system files to only allow the TrustedInstaller user to modify them. Not even System can modify these files, and certainly not Administrator (elevated or not). This is called Windows Resource Protection. How do we keep existing applications from breaking, since this is the reason we didn’t implement the old technique in the first place? We lie to them. We detect if you are an installation program (using the same set of heuristics that we use to automatically mark setup applications to elevate), and if you are, then we accept your request to modify a protected system resource, we return success, but we never actually do anything to that file. If you are not detected as a setup program, then we don’t lie – we return the access denied message. (Of course, you can apply the WRPMitigation shim using Compatibility Administrator if you need that behavior elsewhere, and we disable this check once you manifest your application with a Windows Vista manifest.)
As a result, we dont have dllcache any more. So, this system was trying to drop a new file into dllcache, it wasn’t able to because that folder isn’t there any more, so it figured that a reboot would help. And every single reboot, it wasn’t there, so it figured yet another reboot might help next time. And on it went…