Why is the registry a hierarchical database instead of a relational one?


Commenter ton asks why the registry was defined as a hierarchical database instead of a relational database.

Heck, it’s not even a hierarchical database!

The original registry was just a dictionary; i.e., a list of name/value pairs, accessed by name. In other words, it was a flat database.

.txt txtfile
txtfile Text Document
txtfile\DefaultIcon notepad.exe,1
txtfile\shell open
txtfile\shell\open\command notepad %1

If you turned your head sideways and treated the backslashes as node separators, you could sort of trick yourself into believing that this resulted in something vaguely approximating a hierarchical database, and a really lame one at that (since each node held only one piece of data).

When you choose your data structures, you necessarily are guided by the intended use pattern and the engineering constraints. One important engineering constraint was that you have to minimize memory consumption. All of the registry code fit in 16KB of memory. (Recall that Windows 3.1 had to run on machines with only 1MB of memory.)

Okay, what is the usage pattern of the registry? As originally designed, the registry was for recording information about file types. We have the file types themselves (txtfile), properties about those file types (DefaultIcon), verbs associated with those file types (open), and verb implementations (command or ddeexec). Some verb implementations are simple (command involves just a single string describing the command line); others are complex (ddeexec requires the execute string, the application, and the topic, plus an optional alternate execute string).

  • Given a file type and a property, retrieve the value of that property.

  • Given a file type and a verb, retrieve information about how to perform that verb.

  • The set of properties can be extended.
  • The set of property schemata can be extended.
  • The set of verbs can be extended.
  • The set of verb implementations can be extended.

Since the properties and verb implementations can be extended, you can’t come up with a single schema that covers everything. For example, over the years, new file type properties have been added such as ContentType, OpenWithList, and ShellNew. The first one is a simple string; the second is a list of strings, and the third is a complex key with multiple variants. Meanwhile, additional verb implementations have been added, such as DropTarget.

Given the heterogeneity of the data the registry needs to keep track of, imposing some sort of uniform schema is doomed to failure.

“But you can just update the schemata each time the registration is extended.”

That creates its own problems. For example, to support roaming user profiles, you need a single registry hive to work on multiple versions of the operating system. If version N+1 adds a new schema, but then the profile roams to a machine running version N, then that registry hive will be interpreted as corrupted since it contains data that matches no valid schema.

“Well, then include the schemata with the roaming profile so that when the older operating system sees the hive, it also sees the updated schemata.”

This is trickier than it sounds, because when the profile roams to the newer operating system, you presumably want the schemata to be upgraded and written back into the user profile. It also assumes that the versioning of the schemata is strictly linear. (What if you roam a user profile from a Windows XP machine to a Server 2003 machine? Neither is a descendant of the other.)

But what kills this proposal is that it makes it impossible for a program to “pre-register” properties for a future version of the operating system. Suppose a new schema is added in version N+1, like, say, the IDropTarget verb implementation. You write a program that you want to run on version N as well as on version N+1. If your installer tries to register the version N+1 information, it will fail since there is no schema for it. But that means that when the user upgrades to version N+1, they don’t get the benefit of the version N+1 feature. In order to get the version N+1 feature to work, they have to reinstall the program so the installer says, “Oh, now I can register the version +1 information.”

“Well, then allow applications to install a new schema whenever they need to.”

In other words, make it a total free-for-all. In which case, why do you need a schema at all? Just leave it as an unregulated collection of name/value pairs governed by convention rather than rigid rules, as long as the code which writes the information and the code which reads it agree on the format of the information and where to look for it.

Hey, wow, that’s what the registry already is!

And besides, if you told somebody, “Hi, yeah, in order to support looking up four pieces of information about file types, Windows 3.1 comes with a copy of SQL Server,” they would think you were insane. That’s like using a bazooka to kill a mosquito.

What are you planning on doing with this relational database anyway? Are you thinking of doing an INNER JOIN on the registry? (Besides, the registry is already being abused enough already. Imagine if it were a SQL server: Everybody would store all their data in it!)

ton explains one way applications could use this advanced functionality:

An application would have a table or group of tables in relational style registry. A group of settings would be a row. A single setting would be a column. Is it starting to become clearer now how SQL like statements could now be used to constrain what gets deleted and added? How good is your understanding of SQL and DBMS?

You know what most application authors would say? They would say “Are you mad? You’re saying that I need to create a table with one column for each setting? And this table would have a single row (since I have only one application)? All this just so I can save my window position? Screw it, I’m going back to INI files.” What’ll happen in practice is that everybody will create a table with two columns, a string called name and a blob called value. Now we’ve come full circle: We have our flat database again.

And how would they make sure the name of their table doesn’t collide with the name of a table created by another application? Probably by encoding the company name and application name into the name of the table, according to some agreed-upon convention. Like say, the Settings table used by the LitSoft program written by LitWare would be called LitWare_LitSoft_Settings. So querying a value from this table would go something like

SELECT value FROM PerUser.LitWare_LitSoft_Settings
    WHERE name = "WindowPosition"

Hey, this looks an awful lot like

Registry.CurrentUser.OpenSubKey(@"LitWare\LitSoft\Settings")
        .GetValue("WindowPosition");

One of ton’s arguments for using a relational database is that it permits enforcement of referential integrity. But I would argue that in the general case, you don’t want strict enforcement of referential integrity. Suppose you uninstall a program. The uninstaller tries to delete the program registration, but that registration is being referenced by foreign keys in other tables. These references were not created by the application itself; perhaps the shell common dialog created them as part of its internal bookkeeping. If the registry blocked the deletion, then the uninstall would fail. “Cannot uninstall application because there’s still a reference to it somewhere.” And that reference might be in Bob’s user profile, from that time Bob said, “Hey can I log onto your machine quickly? I need to look up something.” Bob is unlikely to come back to your machine any time soon, so his user profile is just going to sit there holding a reference to that application you want to uninstall for an awfully long time. “Hi, Bob, can you come by my office? I need you to log on so I can uninstall an app.”

So let’s assume it goes the other way: The registry automatically deletes orphaned foreign key rows. (And for hives that are not currently available, it just remembers that those foreign key rows should be deleted the next time they are loaded. Nevermind that that list of “foreign key rows that should be deleted the next time Bob logs on” is going to get pretty long.)

Now suppose you’re uninstalling a program not because you want to get rid of it, but because you’re doing an uninstall/reinstall troubleshooting step. You uninstall the program, all the orphaned foreign key rows are automatically deleted, then you reinstall the program. Those orphaned foreign key rows are not undeleted; they remain deleted. Result: You lost some settings. This is the reason why you don’t clean up per-user data when uninstalling programs.

Enforcing referential integrity also means that you can’t create anticipatory references. One example of this was given earlier, where you register something on version N even though the feature doesn’t get activated until the user upgrades to version N+1. More generally, Program X may want to create a reference to Program Y at installation, even if program Y isn’t installed yet. (For example, X is a Web browser and Y is a popular plug-in.) The Program Y features remain dormant, because the attempt by Program X to access Program Y will fail, but once the user installs Program Y, then the Program Y features are magically “turned on” in Program X.

Consider, as an even more specific example, the “kill bit” database. There, the goal isn’t to “turn on” features of Program Y but to turn them off. Imagine if referential integrity were enforced: You couldn’t kill an ActiveX control until after it was installed!

Comments (33)
  1. dave says:

    Is there a kernel-mode implementation of SQL server?

    ;-)

  2. Adam Rosenfield says:

    If you want a relational database, then use a relational database.  Use the right tool for the right job.  The hammer (registry) isn't your only tool.

  3. David Walker says:

    Another use of "anticipatory" data:  Compatibility shims, where a lot of program names are listed along with their compatibility hacks, even though the user might not have all of those programs installed.  Or might not have them installed YET.

    I have to wonder about people who like to bash without thinking.  I read things like "the registry is horrible; get rid of it".  As someone once said, "and replace it with what?".

  4. Wow, thanx for the explanation. I think the registry was a great idea, perhaps abused, but nonetheless a good idea.

    In what version of Windows did it start?

  5. Antonio Rodríguez says:

    @David Walker: with a bunch of colon-delimited text files, all stored in the /etc/ directory (for system settings) or in the user's home directory (for user settings), and which need to be edited using a text editor (vi being the preferred one, and emacs the second choice). Everybody knows it's easier to manage for the novice user, allows more powerful remote administration for the systems administrator and is more consistent than Windows' Registry for the programmer: it has something for anybody!

  6. Antonio Rodríguez says:

    @Brian Tkatch: the first version of Windows that included the Registry was Windows 3.1, launched in 1992. But at that time, Windows NT had been in development for almost three years, and it's more than possible than the Registry was already implemented in that OS, so the Windows 3.1 was added for compatibility.

  7. 640k says:

    Asking for relational models are trollish. MS tried this with WinFS and failed. The current trend is to flatten databases out. NoSQL!

  8. Crescens2k says:

    @Antonio Rodríguez

    The registry was added for OLE in Windows 3.1, nothing to do with Windows NT. It was later extended to store application data.

  9. jader3rd says:

    Should automatic cleanup happen, the smarts would have to be not to clean up an orphaned entry right away, but to wait for a couple of weeks to pass and a couple of reboots to happen. Then the next probelm to tackle will be automatically defragging the registry.

  10. Klimax says:

    @Antonio Rodríguez 7 Sep 2011 8:23 AM:

    I thought people should to suggest better alternatives, not to send everybody to hell. (back)

    I can remote to/load offline registry through regedit and API is good enough. And it has permissions for control unlike text files.

    BTW Microsoft was sort of there and I doubt they would want to return there…

  11. John says:

    @Klimax: With .NET the trend was heading back towards using individual configuration files for each application. I guess it's just one of those cyclical things, like all the buzz surrounding cloud computing.

    [Configuration files have different use patterns from user settings though. Users like to change settings, so you have to worry about concurrent write access. Configuration files are typically changed only by administrators (and even then only rarely) so concurrent write access is less of an issue. -Raymond]
  12. henke37 says:

    The registry still needs defragmentation. Unless you want to think that windows keeps all of it in memory and rewrites the full thing from scratch each time a value or key is deleted.

  13. David Walker says:

    @henke37:  Do you know exactly how the registry is cached, or not cached, in the memory/page file subsystem?  Do you know whether it is read sequentially from disk on each search for a key, and whether the entire registry is read at once?  Do you know whether the different hives of the registry are stored on disk in different physical files?

    If you don't know the details, then you shouldn't claim that it "needs defragmentation".  If you do know those details, then that's great.

  14. Alex Grigoriev says:

    Overheard on the other side of MS campus:

    "We need to develop a new format for application installation. Let's call it MSI"

    "Why don't we make it a database?"

    "Cool! Let's make it a database. Then we don't need to supply any authoring tools. Everybody can populate a database, right?"

    "And while we're at that, don't care about installer performance. Nobody would complain it takes 1 minute to install a couple files".

  15. John says:

    [Configuration files have different use patterns from user settings though. Users like to change settings, so you have to worry about concurrent write access. Configuration files are typically changed only by administrators (and even then only rarely) so concurrent write access is less of an issue. -Raymond]

    I've seen many .NET applications that have both per-application (e.g. admin settings) and per-user (e.g. MRU lists, window positions, etc.) configuration files; I'm pretty sure there are classes designed to handle this specific scenario built right into the framework.  This is more-or-less coming back full circle to .ini files, only now they are .xml files.

  16. Richard Cox says:

    @John Correct: theConfig = ConfigurationManager.Load(ConfigurationUserLevel.PerUserRoamingAndLocal) will merge settings from AppDataLocal…, AppDataRoaming…, App.exe.config and Machine.config.

  17. A says:

    What about WMI instead of the registry when some sort of RDBMS is needed for storing app data?

  18. Klimax says:

    @David Walker 7 Sep 2011 10:29 AM:

    I do have sort of idea (Windows Internals) and would say once per year one cleanup and defrag is way more then sufficent unless one did huge amount of uninstallation/deletion. (if uninstall-install then no extra block present as they are reused.)

    @John:

    DotNet programms are not the only one. Counter example is TortoiseSVN, which has to use registry because it integrates with shell and thus by that can be running under several user accounts at the same time. (the only ini-like files come from libsvn)

    And with ini/xml you don't get key-level access control. Only file level which is usually insufficent.

    And I don't think many of those programms using xml are written with multiple copies running under different users at the same time in mind. I smell corruption… (or at best sharing violation)

  19. ton says:

    Raymond you surprised me!!! I did not think you would respond. Interesting points you bring up lets go over a few things.

    "As originally designed, the registry was for recording information about file types."

    Well we know that the registry is used to store much more data than that now. It now stores info about the kernel,device drivers, services,SAM, and third party applications and as a result the importance of the integrity of the data has risen significantly from the original design that you described.

    "But I would argue that in the general case, you don't want strict enforcement of referential integrity."

    Yes and I agree with that. I was trying to hint in my original comments that referential integrity should really only be enforced in specific cases where corruption of the registry could occur.

    "Just leave it as an unregulated collection of name/value pairs governed by convention rather than rigid rules, as long as the code which writes the information and the code which reads it agree on the format of the information and where to look for it."

    My thinking about RDBMS' and data storage have evolved since I originally wrote these original comments. I now believe that NoSQL is a viable solution for storing schema-less data.

    en.wikipedia.org/…/NoSQL

    These systems are much more lightweight and permissive than a typical RDBMS. Also there are storage systems that are eventually‐consistent key‐value stores like Apache Cassandra. These type of ACID-like features are being added to the Registry already anyway.

    Example:

    en.wikipedia.org/…/Windows_registry

    "The registry has features that improve system integrity, as the registry is constructed as a database and offers database-like features such as atomic updates. If two processes attempt to update the same registry value at the same time, one process's change will precede the other's and the overall consistency of the data will be maintained.[…] Windows Vista and Windows 7 provide transactional updates to the registry, extending the atomicity guarantees across multiple key and/or value changes, with traditional commit-abort semantics."

    Eventually the registry will evolve to become a complete database, its already partially there its just a question of when the windows team will stop dancing around the bush and take the final leap and implement it as a full fledged database system.

    In the beginning when constrained environments and data were the rule of law, you're right none of these design options made sense but as the complexity and size of the data grows so must the complexity and size of the code that manages it.

  20. Caleb Vear says:

    I would love it if most applications did just keep their settings in an .ini or similar file rather than hiding it away in the registry.  If the settings are kept in the program directory it makes it so much easier to back them up or move them to another machine.  Add to this the bonus of not clogging up my registry with useless crap and it sounds even better.

  21. Joshua says:

    I prefer to use the registry only for its intended purpose: program X is installed in directory YX and handles file type Z has service A and shell plugin B and uninstaller U.

  22. Mark says:

    ton: the registry *is* NoSQL.

  23. me says:

    Note the registry is also read by the boot loader which has a pretty limited size (to check which drivers need to be preloaded with the kernel), so the amount of features you can add to the file format are a bit limited. Still, Microsoft could have done better with regard to stability and efficiency.

  24. 640k says:

    WMI DBs are a horrible example of rdbms gone wrong. With 2000 servers, the probability for any server with one or more corrupt wmi repos is about 70-80% (if you fix the broken repos when you detect them – I do this on daily basis). And keep in mind that windows tries to rebuild broken wmi repos at every boot, without that it would be a even larger mess. WMI apis are therefore not trustworthy.

  25. Cheong says:

    Eventually the registry will evolve to become a complete database, its already partially there its just a question of when the windows team will stop dancing around the bush and take the final leap and implement it as a full fledged database system.

    I think it'll probably never happen. The development of registry has to observe the same rule as notepad – you have to make to possible to use even if most of the other parts of system goes wrong.

    And the result will be probably the same as the (non-essential to become justifible) list of feature people want to add to notepad – Instead, people get those features in Wordpad or Word. In fact this is happening… I see people started to store cluster application settings in database.

  26. Faisal says:

    See the Jeff Atwood's opinion about windows registry.

    http://www.codinghorror.com/…/was-the-windows-registry-a-good-idea.html

    I liked his last observations

    <BLOCKQUOTE>

    How many billions of man-hours could we have saved by now if some early Windows NT 3.0 or 3.5 developers had decided to turn off public access to the registry, and transparently redirected the public registry API calls so they followed simpler, UNIX-like filesystem storage conventions instead?

    </BLOCKQUOTE>

    [Hey, let's go back to the old days when everybody had to write a file parser! Oh wait, I remember those days. You had bugs like "If system.ini contains a line longer than 80 characters, then running program X causes your machine to become unbootable" because program X tried to edit system.ini and messed up and ended up corrupting it. -Raymond]
  27. Greg D says:

    "[Hey, let's go back to the old days when everybody had to write a file parser! Oh wait, I remember those days. You had bugs like "If system.ini contains a line longer than 80 characters, then running program X causes your machine to become unbootable" because program X tried to edit system.ini and messed up and ended up corrupting it. -Raymond]"

    +1 for truth.  It's convenient that .Net apps get all that parsing for free, but a lot of folks aren't writing .Net apps and a lot of folks who are writing .Net apps are still rolling their own parsers for this (presumably due to ignorance of the built-in mechanism, willful or not).  Better a simple call to read/write in the registry than Yet Another Busted Parser.

  28. ton says:

    Aha!!! It turns out someone has already done the leg work on a relational database system back-end with a key-value store front-end end to store data.  

    pages.cs.wisc.edu/…/cdms.pdf

    "CDMS instead stores flat key-value pairs in relational tables; the

    hierarchy is imposed by a naming mechanism. Values de-

    fault to text, but may be optionally interpreted as a num-

    ber, a file name, a link, or other type, if metadata indicat-

    ing the type is stored.

    Unlike other configuration services, CDMS provides

    another dimension of organization: settings related to a

    single user, application, or service are grouped into ob-

    jects. This allows related settings that have unrelated

    names to be associated. These objects represent a group

    of settings in a scope and are the building blocks of an

    inheritance mechanism, where an object may inherit set-

    tings from many other objects."

    Using the configuration objects from the key-value front-end a schema-less data store could be supported AND still have a relational database back end to create a registry that supports referential integrity when needed and is more stable and reliable.

  29. Gabe says:

    Why do people keep complaining about the stability, reliability, or performance of the Registry? I've never heard of it being unstable, unreliable, or slow except on Win9x, and that was over a decade ago.

  30. Nick says:

    @Gabe

    Probably because they're the same people who keep their systems running smoothly with RegCleanerPro2011 and AwesomeSuperRegFixer 4.

    There are very few snakeoil software products more successful than registry "cleaners" and "fixers".

  31. Cheong says:

    @ton: Like what I said, for those who want to store their settings in relational database, it isn't really that difficult to roll out your own. You can even use a SQLCE database to store your settings, then write registry API like wrappers to get the values for you. Noone prevent you to do that.

  32. S says:

    Still using GetPrivateProfileString

  33. hagenp says:

    [As originally designed, the registry was for recording information about file types.]

    First time I noticed the registry being there was in 199x when doing admin work for Win3.1 and MS Office 4.3.

    Some MSO settings were not in an *.ini file but in something (IIRC) like reg.dat.

    Funny enough registry export (with regedit.exe) resulted in a file larger than 64kB, but you had to split it into chunks of 64kB before re-importing it. :-)

    (Our goal at the time was "cloning" MSO installations to different computers when maintaining 30+ computers, and we found out that some system-specific entries had to be renamed.)

    Later tools like "Delta Deploy" very nicely fixed these problems, and were way easier to handle that MS SMS.

    Nowadays registry exporting/importing/editing is much easier.

    Bonus fact: regedit.exe was one of the first tools to actively support dual-mode operation – commandline and GUI – within the same binary.

Comments are closed.