Versioning – source of all good or evil?



I don’t have a real succinct point tonight, just a long hopefully interesting rant/whine.


 


I work on Fusion where we’re trying to make the world a better place for people who want to use software.


 


It’s a real uphill battle though against the people who write software.


 


The first problem we worked on tackling was versioning.  We made a reasonable assumption that people had to follow: if you were going to release software in our world, you had to assign unique identifiers to each version of your software.  Our identifier scheme is a property-bag type of approach called an “identity” and the general thing is that two identities which differ only in the “version” attribute could be assumed to be “the same” component, except for version.


 


That’s pretty obvious but the opposite case has proven to be too onerous for some folks to get their heads around – that it’s illegal to have two components with the same identity but which are different.  Instead the model has been muddied in some cases with notions of “file versions” as well as “component versions” where the real version was component version + file version.  I guess it’s just a matter of time until it’s really component version + file version + file hash, followed by component version + file version + file hash + hash of some file in a relative path.  (These are the kinds of rules that Windows Update has to have for recognizing component versions and whether updates are needed.)


 


Our goal is to provide a “DVD player” like experience for software on windows.  You can bring it on to your system and when you take it off, for the most 99% part, it’s gone.  Yes, some DVD players will cache track lists which can conceptually pollute the state of the player even after removing the disc but really these 2nd order and 3rd order effects are nothing compared to the 1st order effects that stop people from feeling warm and fuzzy about installing software on windows today.


 


But man, people aren’t willing to give an inch on the bad old loose way that things are done in order to move forward.  We’ve introduced concepts of self-descriptive components and applications so that you can do accurate inventory, impact analysis and repeatable deployment of software without having to burn the bits onto unchangeable media.  What’s the result?  It’s seen as too inflexible for the desperate command line tool programmer.


 


I realize that this sort of communications medium is self-selective towards software developers so maybe you don’t want to hear that your job should be a little harder but the real problem we have is that as long as software deployment on Windows is as fragile as it is, there will always be a significant value proposition offered by devices which offer only a small number of functions and which aren’t generally programmable.


 


Maybe that’s the answer – if you want a DVD or game console like experience that when you remove the content, it’s gone, you should use those, but it does seem that it’s just a bunch of prima donnas who don’t want to take responsibility for the fact that their software messes up customers’ machines stopping us from making real progress.  And in the end it just looks like Microsoft can't figure our how to dig itself out of the quagmire it created.


 


Back to the main topic of versioning…  If you want to have “console” like install/uninstall repeatability but still some value in centralized servicing you have to have some way to tell the bits that application “A” brought onto the system apart from the bits that application “B” brought onto the system.


 


A little secret to those who don’t know the real secret to Windows’ success.  It’s not that Windows is necessarily by itself so wonderful and useful – that’s somewhat important when competing against other integrated devices, but the real reason is that it’s a way for other people to make money.  Other people can write software for windows and there’s a whole “virtuous” cycle where 3rd party platforms can be layered on the Windows platform and we make money, the platform folks make money and the application folks make money and hopefully if there’s any point to the application of technology at all, either end users are living happier more fulfilled lives or they’re making more money themselves.  (Yes, I’m a capitalist… 😉


 


The problem is that all these platforms quickly become a cesspool on the target machine.  If your “platform” is just a super duper matrix multiplication library that games can use to manipulate their 3d object spaces, then it’s not a big deal – the next first person shooter probably doesn’t even want to be exposed to a new numerical algorithm improvement without having time to verify that it does what they expect.


 


But if your “platform” is a database or something that provides shared functionality across the machine, you have a bigger problem.  Maybe there can only be one version, but we don’t want to get into the traditional “Dll Hell” problem that libraries like MFC42.DLL have “enjoyed” for ages.  (In case you’re not aware of it, it goes something like this.  Application “A” comes on the system with MFC42.DLL version 1.0.  Application “B” comes on the system with version 1.1 which is obviously “better”.  Hunh… Application “A” stopped working.  Ok, the last thing I did is uninstall application “B”.  Wow, all the uninstallers in the world know that since MFC42.dll version 1.1 is better than 1.0, you should just leave it there.  The end user is left with the only supported action being to reformat their hard drive and reinstall windows.  Bletch.)


 


Our solution is that Application “A” says that it came with 1.0 and Application “B” says it came with “1.1”.  When application “B” is uninstalled we apply the rule of “what would be the version to use as if application B had never been installed”.  We keep the bits for 1.0 and 1.1 separate and we apply “publisher policy“ to ensure that clients, by default, get 1.1.  (A local administrator always has the right to lock back to 1.0 just like they had the right to not install the service pack or QFE in the first place.  It may not work, and it may escape any reasonable support boundaries but the alternatives are that either line of business applications stop working or enterprises delay deploying critical security fixes until their IT departments have had time to verify and fix any compat problems.)


 


I hope it’s clear how this is a better model.  A lot of attempts to make the current world better have been tried.  None of them can really handle uninstall well because they can’t derive what the state of the system should be after uninstall.


 


Another way in which software deployment has always been messed up is that people don’t test their software in the way that they expect it to run on target machines.  There are magic scripts, etc. that you run in the build/test environment (or you maybe trust Visual Studio’s F5 handling to magically set up the environment correctly) which bear absolutely no relationship to what is done to get the software onto an actual customer’s machine.  (You bet your bippy that the dev teams have tools that do the “full uninstall” because otherwise they’d be forced to wipe/reformat their Windows installations for each test pass.)


 


So one of our other mantras is that if you can run the software, you can deploy it.  Meaning that we want to verify that the descriptive information needed to deploy the software has to be there to actually get the code to fire up and print “Hello, World!\n” in the first place.


 


The end result of all this is that versions need to matter, even during development and debugging.  If you have to assume that versions might not change, you can’t do any aggressive validation or self-checking.  Reproducibility goes down the tubes and you’re left in the current quagmire of trying to figure out not just what’s on your machine now but what may have come on and off the machine over time.


 


It seems that we’re lonely in these opinions.  We get continual resistance from teams who don’t want to give up the looseness of how they’ve done software development in the past (even though it’s exactly this looseness that allows the runtime customer machines get so messed up).  They don’t even want to change version numbers.


 


It’s hard to get started on providing a deep and useful solution when people don’t even want to have to change version numbers.


 


Comments (12)

  1. Frans Bouma says:

    That’s all nice and dandy, the problem is in the signed code area.

    You see, if I have a .exe (signed) I have to sign all my .dll assemblies which will be loaded by the .exe as well. No problem. However if I fix a bug in one of these .dll’s, what should I do? Update the version? It will break the .exe. Update the version and supply a policy file? (ah, nice, every bugfix will get a policy file, that’s great for admins :)) Or keep the version the same so the deployment is the easiest: update the dll and you’re set.

    Because of this, it forces developers to release updates less frequently, and if they do, they have to update more software than required. I don’t want to update software less frequently nor do I want to update the software in big chunks because somewhere in some dll a bugfix was added.

    You can tell me to change version numbers, but it is a pain. I have the feeling it is MY problem that it is a pain. I don’t think it is. I also think that by calling people who have faced the signed-assembly misery with versioning stubborn is not helping anyone.

    Why can’t fusion try to load the ‘latest’ version if no dll is found which matches the signature? Then it wouldn’t be a problem at all: the .exe would load the updated dll with the new version (it’s right there in the same folder!). It’s now all or nothing, which is too strict. It can be nice in some situations to have strict versioning, but it can be a BIG PAIN in other situations.

    Personally, I found your text a big dissapointment. It’s as if the problems don’t exist or if they exist, it’s my fault, because "I don’t want to change the way I develop software". I do want to change and I have, the problem is that I’m faced with a problem I can only solve in 2 ways correctly (mentioned above) and 1 way incorrectly (simply keep the version the same, ship a new dll)

    I hope you realize these problems are not just going away by telling developers they’re too stubborn or not doing it right. Think for a second what will happen if MS would release updates to .NET 1.1 every month. These assemblies are all signed, every .NET application requires a policy file for each assembly referenced for each update. That’s a burden to administrators and it’s a burden to developers as well ("You need .NET with spXYZ").

  2. Frans Bouma says:

    Another solution would be this: why can’t I specify in my REFERENCING code which versions to load automatically? Take the .exe and the .dll’s example mentioned above.

    If I version the dll’s as: 1.0.2003.x, I can increase the number on the ‘x’ spot with every bugfix. I should then (In a signed exe and dll!) be able to specify as reference for these dll’s in the .exe: 1.0.2003.*. So every build increase is useful and then load the one with the biggest build number.

    A NEW major version of the dll’s, for example 1.0.2004.* or 2.0.2003.* is then not loaded. Solves the versioning probs with signed assemblies AND still has strong versioning over major versions, something you want. If you do not want this kind of loading, you then still can opt for the full version in the reference: 1.0.2003.1. If 1.0.2003.2 is released it will then not load until a policy file is created or the .exe is updated. Best of both worlds IMHO.

  3. Loose versioning kills repeatability of deployment. Either you have to again repeat saying what all the version numbers are (which nobody likes) or the "same" application when viewed/launched/deployed over time may change its configuration which disables all the predictability, performance, etc. goals.

    Remember that the cardinality of administrators one or two orders of magnitude smaller than actual desktops and the cardinality of developers is again several orders of magnitude smaller than desktops.

    The publisher policy scheme that we shipped in Windows XP and then also in the CLR was too flexible and too explicit. Nobody knows how to use it other than to issue policies that look like "1.0.0.0-1.0.2003.57" -> "1.0.2003.57". We’re addressing this for Longhorn by making the publisher policy statements much more straightforward and easy to write.

    The reason that policy is not implicit is because we want to enable applications to pick up non-critical fixes to shared components and deploy with those fixes used privately by that application. Sure, an SP or QFE may come along later on and make that the official version to load but remember that the goals here are that (a) consumers aren’t afraid of installing software on Windows and (b) business desktops are easily replaced.

    Property (a) comes from self-description and reversibility of servicing/install. Property (b) comes from completeness, predictability and repeatability of deployment.

    Loosening the versioning/binding model disables both scenarios and while there are folks in the company who feel that the biggest problem in life is letting people spew new bits onto end user machines faster and faster, in practice one of (of not the single) the biggest threats to Windows in corporate and consumer scenarios is the unpredictability and non-repeatability of deployment. We hear this over and over and over again.

    Yes there are people who can do it better than your average person and it may be a shame that you don’t get your wish but given the tradeoff of most desktops getting higher quality and satisfaction and a few (albiet highly leveraged) folks having to jump through another hoop, the answer seems pretty clear.

  4. Frans Bouma says:

    Thanks for the explanation 🙂

    So if I understand it correctly, there is no ‘choice’ for me as a developer to choose for a deployment model (make it opt-in) which matches best for MY software release method, because other software vendors do a bad job and thus it has to be more strict?

    I understand MS wants to make it more robust for end-users, but not all developers are dealing with the same end-user level. 🙂 (I for example are solely dealing with developers as end-users, I expect (and can expect) some level of computer-savvyness :))

  5. Of course it’s not that simple or black and white when looking at the overall stack of players.

    The thing that seems most important is that the base platform/OS have a robust predictable pattern of behavior.

    It seems to me that the big hole here is lack of tool support. If you just had to go into your app directory and say something like:

    rebind foo.exe

    before running it, would that really be that onerous?

    Some enterprising person could write a tool that watched the directory for changes and ran rebind on the exe based on file change notifications.

  6. Well, nice text, but you’re missing some fundamental concepts which have been implemented on other platforms for many years.

    1. The windows dl-loader seems to lack the ability of versioning at all.

    On Unix’ish platforms, at least GNU, ld-loader (ld.so) explicitly supports versioning in the dl-dependencies, so MVC at dl-level ist no problem. Each so-file has its version (enoded hierchically) in its name (i.e. libfoo.1.2.3.so). The ldconfig tool creates a cachefile for speeding up library lookup and also throws some symlinks to the major versions (i.e. libfoo.so.1.2 -> libfoo.so.1.2.3, libfoo, libfoo.so.1 -> libfoo.so.1.2, libfoo.so -> libfoo.so), so that for example a depency to libfoo.1 can be resolved to the right minor version.

    This behavour can be fine-tuned in may ways if necessary.

    Since the actual linking proecess completely happens in userspace, it can be controlled and fine-tuned in many ways at runtime. For example you may specify another LDPATH via environment to link a specific process to a completely independent set of libraries.

    This way you can also have several completely independent instances and even ABIs of the same library on your system w/o conflicts.

    You can even put in a completely different dl-loader stubs into your binaries, which does something completely different. This is also used to allow different base libraries on the same system (i.e. libc6 vs. dietlibc)

    2. Windows seems to be missing a package manager which supports package versions and package dependencies, which is also used by application packages. There’s yet some kind of package management, but it seems not to be much more than a GUI frontend to (3rd-party) (in|de)stallation programs.

    IMHO its a really bad habit, that 3rd-party party applications install dlls which dont really belong to them ("shared components"). This is also a big reason, why there were so many systems were vulnerable to sql-slammer: probably most of the users/admins didnt apply the patches, since they simply didnt know that they have to – they didnt even know that they’ve got MSDE running, because it was installed by some application in the background.

    These libs belong into completely separate packages and the application packages simply have to contain a dependency link to a specific separate package in a minimal required version. If an application is to be installed and a dependency is missing, then the package manager has to give a note to the uesr/admin and look where it can get the package from.

    (maybe the application package may ship a matching version of the reqired packages on its install medidum)

    3. MVCC implies a minimal amount of QM constraints.

    + Each higher version of a package or interface _must_ be 100% (ABI-)compatible to a lower version.

    + If a higher version will not be downwards compatible, then ist _may not_ not use the same interface name (and in the case of dll’s of course not the same dll name)

    + if one package is built into several incompatible variants, they have to be installed in different locations and namespaces. The linking application then must be instructed to get the in right version (i.e. some LDPATH env equivalent)

    + the packages must be (also) shipped separately, so that update can happen separately.

    (i.e. to come back to the MSDE example: the package manager will regularily check for new versions, at least critical updates, and ask the user if it should update, when it found some)

    regards,



    e.weigelt, ceo

    metux IT service — http://www.metux.de/

  7. You have basically described our vision we are working to. We are trying to lean away from mechanisms which disable aggressive pre-computation and caching (e.g. use of environment variables to influence behavior) as well as provide a secure platform for discovering components. (e.g. if an untrusted program is able to launch a more trusted program and directly affect its execution – for example by setting environment variables again – you have a security hole.)

    I’ll not address binary compatibility in detail here. First, it’s hard. The people who have been successful with it (a) took a big hit up front doing good design and/or (b) took breaking changes when it was appropriate (and hopefully before the installed base was too big).

    The level of quality around API/ABI design around Microsoft varies greatly from group to group and the fact that one of the major selling points for Windows is app compat, it often means that people who didn’t do a good enough job with design up front are stuck with the compatibility burden more or less indefinitely. I can’t defend these practices but addressing the root cause isn’t trivial.

  8. JD says:

    I agree with the overall post, but this is defintiely a rant.

    It may be frustrating, but if you can’t convince others of the worth of your argument, it’s no use berating them for it.

    The version-hell of the current CLR reflects upon the usability of the Fusion design. The incrementing of VS build-numbers could be a great ay to introduce versioning facts of life in to developers lives. But it’s not, it’s a pain in the butt and causes everyone (yes, everyone) to want to turn it off and deal with it at a later point.

    To integrate this versioning discipline into the build process requires examining the scenarios that developers (from learn-in-21-days hacks to software corporations with sophisticated releasse teams) and coming up with a usable design to address those scenarios.

    The Fusion solution is strong technically, and I understand its factoring, but its usability needs serious improvement. The tools around versioning need to be improved. I’m confident that you can find the answer, but blaming the user because the solutions don’t fit their scenarios is putting the cart before the horse.

    [and I am a big supporter of the versioning efforts. It’s just very hard to sell]

Skip to main content