Deciphering the MSI Directory table, part 3

Last time I spent most of the time talking about the IDT file structure and a little bit of time talking about relational databases. If you remember, in the middle of the blog entry I hinted that the names of the columns had quite a bit of contextual information in them. Before I go on to discuss the actual workings of the Directory table, let me explain what contextual information I was talking about.

It's all about conventions. No, I'm not talking about the big halls of uncomfortable chairs and boring speakers (although I will admit OSCON last year was a lot of fun). I'm talking about the "usual way of doing things". In this case, the names and sizes of the columns in the Directory table follow a convention that can be very useful when you try to decipher an MSI file. Here's a few of the conventions off the top of my head.

1. All column names are Pascal cased. This means that each "word" in column name begins with a capital letter. Our column names are no exception: Directory, Directory_Parent, DefaultDir.

2. The sole primary key column of the table is named the same as the table itself. In this case, the Directory column is the primary key for the Directory table. The File, Registry, Component, Feature tables are a few notable examples.

3. Tables that define a "thing" in the MSI file make their first column an identifier column. Our Directory column is a perfect example. The first column in the Directory is used to uniquely label each directory in the MSI file so that it can be referenced by other tables. It's pretty handy that the Directory column is also our primary key column too, eh? The primary key ensures that no two directories end up with the same identifier. This pattern is used for the File column in the File table, the Registry column in the Registry table, the Component column in the Component table, and so on.

4. Identifier columns are 72 characters wide. Why 72 characters wide? Well, when the Windows Installer team first dreamed up identifier columns it was decided that 36 characters should be enough to define a particular "thing" in the MSI file. So everyone made their identifiers 36 characters or less to pass validation. A couple years later, this pesky intern (er, me), came up with the idea of appending a modified GUID plus a dot on to the end of identifiers to ensure the rows would be unique when used inside Merge Modules. The modified GUID took 35 characters and adding the dot took a total of 36 characters. Obviously, 36 characters were no longer sufficient to describe an identifier. So we decided to just double the size of identifiers to allow Merge Modules to have unique identifiers. It was actually my task to go through all of the table definitions in the Windows Installer and make sure they were consistently 72 characters wide.

5. Foreign keys have underscores in them near the name of the column they refer to. In this case the underscore in the Directory_Parent column tells us that column refers to the Directory column in the same table. This convention is not followed everywhere (the KeyPath column in the File table for example is particularly tricky) but it is pretty consistent. For a perfect example, take a look at the FeatureComponents table.

Anyway, that's a pretty good list of conventions that I picked up over the years of dealing with the Windows Installer. I'm sure there are a couple I've forgotten and if so, I'll find a way to sneak them into whatever story I'm telling at the time. For now though this information plus last week's tale about IDT files should allow us to dig deeper into the Directory table.

[to be continued]