Taxes redux: You can’t open the file until the user tells you to open it


One of the so-called taxes of software development on Windows is being respectful of Hierarchical Storage Management. You can't open a file until the user tells you to open it. This rule has consequences for how Explorer extracts information about a file, because what you definitely don't want is for opening a folder full of archived files in Explorer to result in all the files being recalled from tape. (Actually, file recall is just an extreme case of the cost of opening the file. You run into a similar problem if the file is on a slow medium or over a slow network connection. But just to motivate the discussion, I'll continue with the tape scenario.)

What information does Explorer need in order to display a file in a folder? Well, it needs the file name. Fortunately, that can be determined without opening the file, since the file name is how you identify the file in the first place! When you want to open a file, you pass its name. There, you have the name. If you are showing the contents of a folder, you used a function like FindFirstFile to get a list of all the files, and the list comes in the form of names. Okay, so the name is easy. (There are still subtleties here, but they are not relevant right now.)

Okay, what's next. The icon. Well, in order to get the icon, we need to know the file type, so let's put the icon on hold for now.

The file creation and modification times can be obtained without opening the file; they come out as part of the FindFirstFile, or if you started with a file name, you can recover them with GetFileAttributesEx. Either way, you can get that without opening the file. Same goes for the file size.

The other properties, like Title, Author, Summary... They all require that the file be opened, so Explorer disables the property for files that have been archived to tape. You don't want to recall a file from tape just to show its Author in Explorer.

Okay, that leaves just the file type (and the icon, which depends on the file type). Consider the possibilities for how the file type could be determined.

What do these problems tell us? The first problem, that reading bytes from a file forces a recall, means that file type information cannot be based on the file contents. The second problem, that reading alternate data streams also force a recall, means that you can't put it in an alternate data stream either. All that's left is storing the type in the metadata.

But the third problem tells us that there isn't much metadata to choose from. Whatever mechanism you use needs to be able to survive being sent as an email attachment or being uploaded to an FTP site. Email attachments in particular are extremely limited. Most email programs, when asked to save an attachment, preserve only one piece of metadata: The filename. (They often don't even preserve the last modified time. And good luck getting them to preserve other ad-hoc metadata.)

All of these problem conspire to rule out all the places you can squirrel away type information, leaving just the filename. It's a sucky choice, but it's the only choice left.

And it means that changing a file's extension means the file type information is destroyed (which, from an end user's point of view, may as well be corruption).

Comments (48)
  1. Robin says:

    I have found FindFileFirst to be slow when enumerating all files in a folder containing a large number of files, esp compared to the performance of the dir command.  Is there a faster way to obtain a list of all files in a folder?

    [ “dir” uses FindFirstFile, so any performance comparison is in your imagination. -Raymond]
  2. someone else says:

    Ok, so the file extension is metadata, and by renaming it, I destroy (or as they said in the old days: change) it.

    But could you pretty please with sugar on top implement an obscure registry setting where you can disable that freakin’ warning?

  3. Really? says:

    @robin Hmm, i wonder how dir does its work. (I don’t know for sure, but I suspect Dir probably uses FindFirstFile.)

  4. Adrian says:

    I must be missing the point altogether.  I think we’re coming at this with different sets of assumptions.

    Shouldn’t mail attachments all come with a MIME type, which is metadata that describes the data type for the attachment?  Wasn’t that the point of MIME types?

    “Read the first few bytes to determine the type and  cache it in the file system.

    Problem: Um, that’s just begging the question. We’re trying to figure out where in the file system to save it!” We are? I though the premise was to understand what Explorer needs to display a file in a folder. The file is already in the file system, so any metadata that the file system caches should already be available without opening the file.  I don’t see how this is begging the question.

    If I recall, VMS didn’t rely on the file extension for the file type.  The directory file held metadata that described the file type (at least at a general level).  Extensions were a convention, but far from a rigid system.

    [Oh, so your answer to “Where in the file system should we save it?” is “In a new type of metadata that is never archived to tape and is included in some new enumeration API so retrieving it doesn’t require another round trip to the server.” I think you’ll find that the number of email programs that save the MIME type when saving an attachment is approximately zero, so you’ll have to get all email programs on board with this new SetFileMIMEType API. And then get all existing applications to apply for a MIME type for all their file types, and getting all programs which create files to set the MIME type metadata too. How do you set the MIME type from a batch file? Or perl script? And what if a file’s extension disagrees with the MIME type? MIME-aware programs will use the MIME type and non-MIME-aware programs will use the extension. I see the virus writers rubbing their hands together in gleeful anticipation already. -Raymond]
  5. Aardvark says:

    What about exe’s icons which are embedded in the resources of the exe itself? I assume they would get recalled?

    [Try it and see. -Raymond]
  6. Wang-Lo says:

    Thank you for using the phrase "begging the question" correctly.  Too many careless or lazy writers think it means "raising the question" or "requires the question".  Let us hope that proper language by a influential blogger can set a few of these yahoos straight.

    -Wang-Lo.

  7. Erzengel says:

    Aardvark: "What about exe’s icons which are embedded in the resources of the exe itself?"

    If you watch on a relatively slow medium, such as a CD drive, you will see explorer show a default icon for exes while it loads the icon that the exe contains. I’ve never worked with tape drives, but I think it’s fair to assume it won’t recall the exe and will merely show the default exe icon.

  8. Nick says:

    "could you pretty please with sugar on top implement an obscure registry setting where you can disable that freakin’ warning?"

    How often do you really need to change the file extension? Even then, when you do, the default action is ‘Yes’, so you can just hit Enter.

    Assuming you do need to rename a lot of file extensions, if you think you are tech-savvy enough to go mucking around in some registry changing secret settings, then surely you can figure out how to do it via some other means which does not display a prompt.  Clicking dozens of files, one at a time in Explorer, is hardly efficient.

    I like file extensions.  It gives me instant knowledge about what kind of file I’m looking at, all without invoking some other investigative application (which may even have an exploit that makes a previously benign file malicious).

  9. dave says:

    If I recall, VMS didn’t rely on the

    file extension for the file type.  

    I think you recall incorrectly (at least up through about V6, which is when VMS and I parted company). If the prevailing model for Windows can be considered as "tell me the full name and I’ll tell you what you can do with it", then the prevailing model for VMS was "tell me what you want to do with it and I’ll tell you the full name".

    Thus if you fed FOO to the Bliss-32 compiler, it would look for FOO.B32 (and then a whole raft of other possible Bliss extensions). If you fed FOO to the linker, it looked for FOO.OBJ, unless you’d said FOO/LIB, in which case it would look for FOO.OLB.

    The directory held some Record Management System attributes, but those only told you about the file structure (variable-length with byte count, variable-length with CRLF terminators, etc).

  10. Adrian says:

    Dave is right.  I was thinking about the RMS metadata that the Files-11 system kept about each file.  Still, that shows that it’s possible to keep metadata about a file that remains accessible even when the file itself is off on tape somewhere.

    As to Raymond’s response to my comment, I think we’re on completely different pages.  I (still) must be misunderstanding the point of the post.  It seemed like the point was that to show files in a folder, you only have access to the metadata, because it might be too expensive to actually open the file.

    It appears he’s looking at it in terms of given how everything currently exists, how would you can’t do any better.  I misunderstood and thought he meant that it couldn’t ever be done better.  I pointed out that the decision to tie the file type to the file extension rather than keeping it as a separate bit of metadata.  Clearly, that’s a hard decision to undo now, but it doesn’t explain why that decision was made originally (which is what I thought Raymond was trying to explain).  The fact that VMS almost does it (by keeping RMS metadata in the directory) is a counterexample.

    I’m not trying to be nitpicky or argumentative.  I think I just missed the point of the article.

    [Right, the article is about what Explorer could have done in 1995 without the benefit of a time machine. -Raymond]
  11. Joe says:

    Adrian — I think the point is not only can you only rely on the meta data, but you also can’t really change the meta data w/o changing all apps that deal w/ the meta data…e.g. you are stuck!

  12. ace says:

    I have found FindFileFirst to be slow when enumerating all files in a folder containing a large number of files, esp compared to the performance of the dir command.

    I’m sure you first did FindFile enumeration and then dir on the same directory. The second time everything was already cached in RAM.

    The first time… NTFS and Windows are more orders of magnitude slower than others  (surpisingly the web is very quiet about this). NTFS/Windows enumerate files in a directory fast only for unbelievably small number of files. On even newest computers and Vista an innocent "dir /b" of the directory with only 3000 files can take 20 seconds the first time (on the naked OS, without any antivirus/whatever programs installed). The second time a split of second (compared that with Linux/ext3, where the first time is so fast). Don’t just say "use any defragmentation program" — I haven’t found a single one able to optimize directory enumeration.

    When we’re there, does anybody know any way to do on Windows the equivalent of Linux’s

    "echo 3 > /proc/sys/vm/drop_caches"?

  13. Gabe says:

    Adrian, Raymond is saying that a system as you envision would not work. Of course you could create your own such system, but it would not interoperate with the billion other systems in existence.

    Every program currently in existence wouldn’t know how to create the metadata you want. Even if you only ever used programs that have been upgraded to support the new APIs, any files from older or foreign systems would have no such metadata.

    In fact, the system you propose already exists. MacOS stores a 4-character file type (among other things) as metadata for each file, but OS X has to support filename extensions because otherwise any file created by a Unix program, or FTPed, or coming from a digital camera would be unrecognized.

  14. Jonathan Wilson says:

    I want to know why "Hide extensions for known file types" was invented and why it defaults to "enabled".

  15. John says:

    Jonathan: Because users are dumb.

  16. Dean Harding says:

    Jonathan: read just about every other blog post of Raymond’s where he mentions filenames. Someone always asks that question. It’s some kind of Godwin’s Law for filename discussions  ("as the discussion becomes more and more about filenames on Windows, the probability of someone asking ‘what moron thought up the Hide Extensions for Known File Types setting?’ approaches 1.")

  17. John says:

    Speaking of file extensions, why does explorer offer to show bizarre columns like "Perceived Type" but not such a basic one like "File Extension"? Did the explorer team really think the "Type" column was good enough?

  18. Chris Lineker says:

    You can’t open a file until the user tells you to open it.

    Stop making thumbnails of my videos then. It hangs explorer on removable media, it also means i have to wait for it to finish putting its grubby mitts all over my disk before I can start copying files from it.

  19. Dean Harding says:

    "Stop making thumbnails of my videos then."

    Click Tools->Folder Options, go to the "View" tab and check "Always show icons, never thumbnails."

  20. jcs says:

    ace: That’s weird. I never noticed that.

    As a test, I just plugged an external hard drive (to avoid the directory being cached) and did a dir of a directory with 1000 files. It took less than a second…even though I’m on Vista, using NTFS (and the files were even encrypted.)

    Maybe "the web is quiet about this" because nobody else is having this problem. There could be something wrong with your disk…

  21. Karellen says:

    I’m just wondering what the cost would be of adding the first ~4k of archived files to the "stub".

    Many file types try to store meta data at the start of a file so that it can be read as soon as possible after internet downloads have started. Magic number MIME type calculation also generally works fine on just the first 4k.

    Also, because of the way that file read operations tend to work (well, at least fread(3), read(2) and ReadFile()), if an application opens a file and requests a large read, just returning the first 4k (or however much you have in the stub) is a valid response. It would only be necessary to recall the rest of the file if the user does a read when the file descriptor is pointing to the byte after the last byte stored in the stub.

    OK, it wouldn’t work for *every* file type, but it might help in a lot of cases. If it were implemented, then newly invented file formats might try to put as much metadata as possible at the start of the file in order to work with the feature.

  22. Peter says:

    ace: you’re nuts.  I just did a dir on 1880 files; it took about three seconds.  And much of that time was waiting for the screen to scroll — it seems to be proportional to how much of the console is visible.

    Adrian: as you already know, that’s not what VMS did.  What VMS did was fascinating, though: they had a call where you passed in both the string that the user gave you, and a string of what you wanted for defaults, and it would combine the right bits of the right strings.

    Example: user asks for "MYdisk:foo" and the default string is "node::diskB:[directory.hierarchy]bat.ext" — and the result returns "node::MYdisk:[directory.hierarchy]foo.ext"

    It was a very handy system.

  23. Dean Harding says:

    I just did a "dir /s" of my c:windowswinsxs directory and it took about 65 seconds for 65,000 files and 53,000 directories. As mentioned, much of that time seems to be spent redrawing the console (if I make the console small, it runs faster).

  24. Alexander Grigoriev says:

    @Dean Harding,

    Maybe someone will finally finally pay attention and make a default setting to show the extensions? So that the fewer users will be tricked into clicking executable files disguised as MP3?

  25. ace says:

    much of that time seems to be spent redrawing the console

    Measure it using "dir /b theFullPathToDir >nul". No console redrawing. And no access to the directory before measurement (including system access!).

    it took about 65 seconds for 65,000 files and 53,000 directories

    You selected exactly the directory which system must access independently of you and before you access it. And the average number of files per directory was 1.2, not a few thousand — that’s huge difference for NTFS. Somewhere on your C disk (it’s not so visible if the disk is with too few files or the disk is small) create yourself a directory of e.g. eight thousand files, the files can be even of zero size (I just used Local Settings/Application Data/Opera/profile/images folder). Then measure dir /b theDir >nul after the reboot, once the computer is quiet and certainly before you accessed it through the Explorer. Then measure it the second time. Compare. (Of course I assume you use a normal disk, not a SSD or big RAID array — the effect comes from too much head seeks).

  26. Nicholas Sherlock says:

    @Alexander Grigoriev:

    Users don’t know what the extension means or is for. They would be no better at spotting executables disguised as Jpegs if the default was changed.

    If you show the extension to them, they will just change it and then wonder why they can’t open the file. Or they will assume that changing the extension also changes the type of the file, and use it to turn bitmaps into jpegs.

  27. Joseph Koss says:

    In plain old FAT16, file extensions *are* essentialy type metadata… a field in the directory structure which is seperate from a files 8-character name, which is used to store type information.

    If thats not seperate type metadata, then what is?

    The only difference I see is that with FAT16 file extensions, you arent really restricted by them. That a file with the extension ‘com’ or ‘exe’ might not actualy be an executable file.. that it is essentialy easy for the field to lie about the file type.

    I think the real problem is that we have misused the term ‘extension’ .. that in fact what we call the ‘extension’ is actualy the type metadata.

    There isnt any other purpose for file extensions (if you want to extend the name, just add to the name), so I honestly don’t see the problem with the system as it is..

  28. Dean Harding says:

    ace: OK, this time, I created a random directory on my D: drive, and added 10,000 files to it. Then I rebooted. When Windows came back up, I waited for the hard disk light to stop flashing, then I opened up a command prompt, typed "dir /b <path> >nul" and it finished in less than 4 seconds.

    This is on the Windows 7 beta (not at work, so don’t have access to Vista here) but I don’t imagine there’d be much difference…

  29. Robin says:

    ace – you are correct re the caching issue.  it seems that whichever is called first spends about one disk seek  latency per file returned unless the filenames are already cached.

  30. Steve Smith says:

    > * Read the first few bytes of the file looking for magic numbers.

    >   Problem: Reading bytes from the file forces a recall.

    It depends on when you do it surely.  If the metadata is created when the file is created or updated, the file hasn’t been migrated and doesn’t need recalling.  Also you don’t need to recreate the information every time the directory is opened because the info is already in the filesystem.

    > * Store the type in the file’s metadata.

    >   Problem: The metadata won’t survive being sent as an email attachment or transferred to a file system that doesn’t have a place to store that metadata, such as a floppy drive, a CD, or a unix volume.

    So what.  Depending upon the destination, neither will the creation date or the file size or anything else apart from the file name if you’re lucky.  Also we can’t copy it to a wild filesystem or an email attachment if it has been migrated, it has to be recalled first.

    >  * Read the first few bytes to determine the type and cache it in the file system.

    >    Problem: Um, that’s just begging the question. We’re trying to figure out where in the file system to save it!

    Why not put it into the directory entry?  That’s where it’s stored at the moment.

    >  I think you’ll find that the number of email programs that save the MIME type when saving an attachment is approximately zero.

    It doesn’t have to be MIME, although that would be my choice if I had a blank slate.  Why not use the extension?  When the file is created on a tame filesystem as NoisyRacket.mp3, set the file type to mp3.  If I rename the file to BeautifulLullaby.avi, the file type is still in the directory as mp3.

    If we access the file in the wild, from a source that doesn’t support the feature (Linux or CD or email or whatever),  read the file looking for magic number.  When it reenters our tame filesystem, set the file type field in the directory to whatever the magic number indicates.  The user doesn’t have to get involved.

    > Gabe:  Every program currently in existence wouldn’t know how to create the metadata you want. Even if you only ever used programs that have been upgraded to support the new APIs, any files from older or foreign systems would have no such metadata.

    Well, I have a radical idea, let the operating system do it.  That’s supposed to be it’s job, it’s about time it acted like it.

    [No fair inventing new types of metadata. The context of the discussion was “What can Explorer use to obtain the file type, given how file systems work today?” (Mind you, requiring the filesystem to infer the MIME type at file creation time requires a good amount of clairevoyance. How does it know that a file that begins with the characters ‘<?xml version=”1.0″ encoding=”UTF-8″?>’ is of type application/x-litware-customerlist before Litware is installed? How does it know that it’s a Litware customer list and not a Contoso product catalog? [which also takes the physical form of an XML file]) -Raymond]
  31. Alexander Grigoriev says:

    @Nicholas Sherlok,

    A friend of mine, who is perfectly aware of what the file extensions are, and is quite computerproficient, fell victim of an email virus (VBS which pretended to be an MP3 file), which successfully replaced his valuable JPEG files with the script’s copy. Thanks, microsoft, for saving him from his "ignorance".

  32. ace says:

    about one disk seek  latency per file returned unless the filenames are already cached

    Yes, I believe that’s the possible in extreme case, one seek per a file name in a directory, only to get the list of the files there.

    Dean, for example, used D disk which was probably in better state than C (regarding how the file info would be distributed). If his disk takes 4 ms per seek, there were 1000 disk seeks just to get a list of 10000 files.

    But "just 4 seconds" is not fast at all! A modern disk can have 100 MB per second sustained data rate, so the disk was able to deliver 400 MB (megabytes) in the time a list which totalled around 100 KB (kilobytes) (if the file names were short) was fetched.

  33. Timothy Fries says:

    A friend of mine, who is perfectly aware of what the file extensions are, and is quite computerproficient, fell victim of an email virus (VBS which pretended to be an MP3 file), which successfully replaced his valuable JPEG files with the script’s copy. Thanks, microsoft, for saving him from his "ignorance".

    In your example the user already has two very important clues that a VBS script is not an MP3 file.  #1 is the file type icon, and #2 is that the ".mp3" extension would be appearing on that file while all other MP3 files would show no extension.

    If these two deviations from the norm didn’t clue the user into the fact that something wasn’t right, what makes you think seeing the file extension would?  I mean they’ve kinda already proved they don’t pay that much attention to the file ‘extension’ by ignoring clue #2 above.

  34. Anonymous says:

    To all here who advocate using MIME types: clearly you have not been bitten by a misconfigured web server that tags your favorite binary file format as "text/plain", causing your browser to display its file contents as text.  By relying on MIME types, you are trusting your data source not to lie to you, or even disagree.  Your users may encounter strange and frustrating failures when some other piece of software disagrees about what MIME type to assign.

  35. The Imp says:

    Raymond, your point is duly noted.

    Even so, back in 1995, when you (not you personally) were redesigning FAT to add Long File Name support, wouldn’t it have been a good idea to add support for other things too, such as extended attributes or additional metadata, via the same nonsense-hidden-folders mechanism? The principle of identifying files by magic was already very old even then. As for MIME, as you rightly said, it was not commonplace or of foreseeable consequence. However, if you’re going to say that “We had no way of knowing that MIME would become a de-facto standard for identifying files, because web/email wouldn’t really take off for a few more years”, then you can hardly argue that “you can’t use metadata solution X because it won’t survive being emailed” — email was not a design consideration in 1995, and if it was, a MIME extension to the filesystem would have been a logical choice! Now who’s begging the question?

    I’ll add that an awful lot of mail clients don’t even go to any special effort to preserve file names (why would you name a JPEG “part 1.2”, iPhone?), let alone any other kind of metadata. And speaking of Apple, they’ve always had problems with emailing files on account of the way that their filesystem (used to) store their metadata.

    (I’m picking on Apple here, but the criticism applies far more widely than just them. And to be fair, at least the iPhone does set the MIME type correctly.)

    As for the “misconfigured web server” issue that our good friend Anonymous points out: as soon as you start talking about “misconfigured software X”, it’s not going to be a very meaningful discussion. Yes, I’ve had that particular annoyance, among others. But that is not the fault of MIME; and moreover, if the server was hosted on a filesystem that correctly supported MIME, then it wouldn’t be an issue.

    [People were emailing files around long before MIME was invented. Remember uuencode? Suppose you invented a new file typing system. How do you make it work with existing programs that were designed under the old system? And in a way that won’t make virus-writers giggle with glee? -Raymond]
  36. steveg says:

    @ace, @dean: is the time taken to sort 10,000 files a measurable proportion of the time you’re measuring? I don’t think dir is capable of returning unsorted results.

    The other issue to consider is scroll speed. Measure the time with the smallest console window you can, and with the largest. Also try fullscreen vs window mode.

  37. frustrated with you know who says:

    @igor: You have said before you won’t come back here and well… here you are. You’re clearly at least moderately clever but you do come across as rather an unpleasant person.

  38. Randall says:

    Recently ran into a related problem in real life in a Mac/Unix environment.  I can mount a remote filesystem and treat it as a local directory; it’s very useful to me, but the latency is pretty awful.  Many programs, including my former favorite text editor, work fine locally but slow to a crawl working with lots of files in remote directories.

    These programs aren’t even trying to read magic numbers in files as far as I know; they’re just assuming filesystem access is pretty cheap, so they make more synchronous file accesses than they strictly need to.  For example, my editor would frequently update an autosave file on the remote FS, or check that no other program had changed any open remote file while it wasn’t looking, and I’d have to wait a fraction of a second (or occasionally much more) while it did it.  

    I ended up using a different text editor that has its own built in virtual filesystem hoo-de-hah; it knows when it’s working with a remote filesystem and quits doing the problematic stuff then.

    (Yes, I should probably be using vi or SMB or NFS or $method.  Flames are welcome if they come with constructive tips I could use.)

  39. Joseph Koss says:

    Tim Fries:

    You missed the point.

    Explorer defaults to NOT showing file extensions, so the lack of ".mp3" or ".jpg" isnt a clue since there isnt a ".vbs" displayed either.

    As far as icons.. there are multiple views in explorer, and not all of them have shown icons over the years. Still further, there is no standard protected set of icons which will guarantee that things like .vbs and a .mp3 will have easily differentiable icons. Media players are notorious for chaning all the icons for all the file types it can handle to its own custom icon.

  40. ace says:

    @ace, @dean: is the time taken to sort 10,000 files a measurable proportion of the time you’re measuring?

    No, NTFS by design maintains sorted order at the moment the file is added to directory. FAT keeps unsorted directories.

    The other issue to consider is scroll speed.

    Already eliminated. See " >nul" in previous posts.

  41. Chris Lineker says:

    @Dean Harding

    That option doesn’t exist pre-vista, and it doesn’t prevent other behaviours such as calculating the dimensions of the video frame.

  42. Igor Levicki says:

    >>You can’t open the file until the user tells you to open it<<

    Is that so? Then why Explorer opens Recycler and System Volume Information folders, and why that Windows Portable Devices CRAP insists on opening folder named WPDNSE in my TEMP folder?!?

    I never told them to open any folders.

    Why I cannot delete wpd*.* from WindowsSystem32? Who keeps them open so I can’t access them? I don’t remember telling anyone to open those files and keep them locked.

    Why in Vista administrator doesn’t have access to many registry keys?

    Even worse, why I cannot recursively take ownership and add permissions on registry keys so I can delete a tree of keys?

    I want to have the final say what my system does, and it gets harder and harder with every Windows version and then you start preaching how Explorer and Windows in general respect user wishes. Yeah right…

  43. Xepol says:

    and the correct conclusion is actually that the OS stores an insufficent amount of meta data since accessing the files over a fast medium feels like a fail when you have enough files.

    MS tries to fake its way past this with "indexing".

    Sadly, it still feels like a fail.

    Maybe in today’s day and age, the OS should have more advanced meta data features than, oh, dos 2.11?

  44. Steve Smith says:

    > [No fair inventing new types of metadata. The context of the discussion was “What can Explorer use to obtain the file type, given how file systems work today?”

    So, I am stuck with using file extensions for the rest of my life because of a decision based on FAT-12?  That doesn’t seem fair.  Are you saying that because the designers chose the “sucky choice” in 1927 when windows first sprang forth that we are not allowed to consider something else?

    The original choice was made for some reasons, probably very good ones at the time.  However, Windows has changed in so many ways since: we are not limited to 8.3 names, or 2Mbyte partitions or 64M memory, why not here?

    > Mind you, requiring the filesystem to infer the MIME type at file creation time requires a good amount of clairevoyance. How does it know that a file that begins with the characters ‘<?xml version=”1.0″ encoding=”UTF-8″?>’ is of type application/x-litware-customerlist before Litware is installed? How does it know that it’s a Litware customer list and not a Contoso product catalog? [which also takes the physical form of an XML file]) -Raymond

    Sneaky!  Using file extension .xml, like we do at the moment, sure copes with that one.  But I’ll have a go.  The code that works out the “magic” does not rely on the component being installed.  Remember this is off the top of my head, design by the seat of the pants (Yes, I know…but they keep my ears warm)  but it seems to me, it would be ideal choice for a finite state machine implementation.  We wouldn’t need to change the code, just the tables.  This could be done via the interwebbything.

    [So each time the table gets updated, the file system has to go through and recalculate all the MIME types that were calculated using the old tables? (The hard part isn’t finding a “there” you want to be. The hard part is getting from “here” to “there” without losing all your customers.) -Raymond]
  45. Aaargh! says:

    > “No fair inventing new types of metadata. The context of the discussion was “What can Explorer use to obtain the file type, given how file systems work today?”

    Well, since Explorer was part of this brand new operating system you were designing at the time, and you were already messing around with the FS as part of creating that OS, why is the way filesystems work ‘today’ a given ? Aren’t you supposed to be working on ‘tomorrow’ ?

    [Explorer did not rely on any of the new features for proper functioning. That’s part of what made Windows 95 a success. (Besides, if you don’t work today, you won’t be around tomorrow.) -Raymond]
  46. 211 says:

    > Explorer did not rely on any of the new features for proper functioning.

    Oh, the Windows 95 Explorer shell works fine and dandy if the filesystem somehow prevents Long File Name “metadata”, does it?

    [Absolutely. That was an explicitly supported configuration, and a lot of effort went into making it work. -Raymond]
  47. Nicholas Sherlock says:

    The set of people who really understand what file extensions mean are more or less the same set of people who know how to turn on file extensions.

  48. Hmm… not at all convinced that I’m not missing something here, but…

    Instead of making it part of the file or part of the filesystem metadata, why not make it part of the "offline files" API?  That doesn’t address a network share, indeed (though I’m not sure it couldn’t be done there) but if the API is written such that a computer hosting offline data is also required to cache some bits of metadata… wouldn’t that fix it?

    On one hand, yes, this is me handing a time machine to the Win95 developers – but on the other hand, it’s still very much a relevant question for other reasons, as "cloud" storage becomes the new "in thing"…

Comments are closed.