Deciphering the MSI Directory table, part 3


Last time I spent most of the time talking about the IDT file structure and a little bit of time talking about relational databases. If you remember, in the middle of the blog entry I hinted that the names of the columns had quite a bit of contextual information in them. Before I go on to discuss the actual workings of the Directory table, let me explain what contextual information I was talking about.

It’s all about conventions. No, I’m not talking about the big halls of uncomfortable chairs and boring speakers (although I will admit OSCON last year was a lot of fun). I’m talking about the "usual way of doing things". In this case, the names and sizes of the columns in the Directory table follow a convention that can be very useful when you try to decipher an MSI file. Here’s a few of the conventions off the top of my head.

1. All column names are Pascal cased. This means that each "word" in column name begins with a capital letter. Our column names are no exception: Directory, Directory_Parent, DefaultDir.

2. The sole primary key column of the table is named the same as the table itself. In this case, the Directory column is the primary key for the Directory table. The File, Registry, Component, Feature tables are a few notable examples.

3. Tables that define a "thing" in the MSI file make their first column an identifier column. Our Directory column is a perfect example. The first column in the Directory is used to uniquely label each directory in the MSI file so that it can be referenced by other tables. It’s pretty handy that the Directory column is also our primary key column too, eh? The primary key ensures that no two directories end up with the same identifier. This pattern is used for the File column in the File table, the Registry column in the Registry table, the Component column in the Component table, and so on.

4. Identifier columns are 72 characters wide. Why 72 characters wide? Well, when the Windows Installer team first dreamed up identifier columns it was decided that 36 characters should be enough to define a particular "thing" in the MSI file. So everyone made their identifiers 36 characters or less to pass validation. A couple years later, this pesky intern (er, me), came up with the idea of appending a modified GUID plus a dot on to the end of identifiers to ensure the rows would be unique when used inside Merge Modules. The modified GUID took 35 characters and adding the dot took a total of 36 characters. Obviously, 36 characters were no longer sufficient to describe an identifier. So we decided to just double the size of identifiers to allow Merge Modules to have unique identifiers. It was actually my task to go through all of the table definitions in the Windows Installer and make sure they were consistently 72 characters wide.

5. Foreign keys have underscores in them near the name of the column they refer to. In this case the underscore in the Directory_Parent column tells us that column refers to the Directory column in the same table. This convention is not followed everywhere (the KeyPath column in the File table for example is particularly tricky) but it is pretty consistent. For a perfect example, take a look at the FeatureComponents table.

Anyway, that’s a pretty good list of conventions that I picked up over the years of dealing with the Windows Installer. I’m sure there are a couple I’ve forgotten and if so, I’ll find a way to sneak them into whatever story I’m telling at the time. For now though this information plus last week’s tale about IDT files should allow us to dig deeper into the Directory table.

[to be continued]

Comments (8)

  1. AJ says:

    Hey Rob,

    Here is a question that is a little off-topic, but while we are discussing relation databases… Are there any Many:Many relationships in the Windows Installer tables? If there are any, I haven’t noticed them and was wondering if I had missed any.

  2. robmen says:

    Off the top of my head I can’t think of any. The majority of the Windows Installer’s data is very hierarchical, that is the main reason the WiX toolset works so well. Hierarchical data implies one to one or one to many but no many to many relationships. I’ll keep a lookout for one but nothing comes to mind now. Interesting question.

  3. Feature Components table is a classic example of a many to many relationship. A component can be associated with many features. And a feature can contain many components 1600 to be precise in case of 2k or XP.

  4. robmen says:

    Vagmi, of course. Good to know you’re still reading and cleaning up after my dumb mistakes. I got stuck trying to think of any Resource types that would have a many to many or Dialogs that would have many to many. Feature to Components and Features to Modules are the perfect example of many to many because that’s where we have to duplicate information in the WiX schema (Component/ComponentRef and Merge/MergeRef).

    Okay, I need to go climb back under a rock for a while and get some coding done.

  5. AJ says:

    The FeatureComponents table isn’t really a many to many relationship. If that table didn’t exist, then the linking between the features and components would have to be a many to many. As it stands, the FeatureComponents table has a 2-way 1:many relationship with the Feature and the Component table. In the table, 1 feature can reference many components and 1 component can reference many features. It’s not really a true many:many relationship though.

  6. Imagine a blog entry where I finish up talking about the basics of the Directory table.

  7. Imagine a blog entry where I finish up talking about the basics of the Directory table.

  8. Thomas Hecker says:

    Why it is not possible to significantly increase this limit of 36 charse for merge module filenames? Sooner or later you can run into trouble, is I did…

    Is it contemporary to have such restrictions?