Why does Windows still place so much importance on filenames?


Earlier today, Adrian Kingsley-Hughes posted a rant (his word, not mine) about the fact that Windows still relies on text filenames.

The title says it all really. Why is it that Windows still place so much importance on filenames.

Take the following example – sorting out digital snaps. These are usually automatically given daft filenames such as IMG00032.JPG at the time they are stored by the camera. In an ideal world you’d only ever have one IMG00032.JPG on your entire system, but the world is far from perfect. Your camera might decide to restart its numbering system, or you might have two cameras using the same naming format. What happens then?

I guess I’m confused.  I could see a *very* strong argument against Windows dependency on file extensions, but I’m totally mystified about why having filenames is such a problem.

At some level, Adrian’s absolutely right – it IS possible to have multiple files on the hard disk named “recipe.txt”.  And that’s bad.  But is it the fault of Windows for allowing multiple files to have colliding names? Or is it the fault of the user for choosing poor names?  Maybe it’s a bit of both.

What would a better system look like?  Well Adrian gives an example of what he’s like to see:

Why? Why is the filename the deciding factor? Why not something more unique? Something like a checksum? This way the operating system could decide is two files really are identical or not, and replace the file if it’s a copy, or create a copy if they are different. This would save time, and dramatically reduce the likelihood of data loss through overwriting.

But how would that system work?  What if we did just that.  Then you wouldn’t have two files named recipe.txt (which is good).

Unfortunately that solution introduces a new problem: You still have two files.  One named “2B1015DB-30CA-409E-9B07-234A209622B6” and the other named “5F5431E8-FF7C-45D4-9A2B-B30A9D9A791B”. It’s certainly true that those two files are uniquely named and you can always tell them apart.  But you’ve also lost a critical piece of information: the fact that they both contain recipes.

That’s the information that the filename conveys.  It’s human specific data that describes the contents of the file.  If we were to go with unique monikers, we’d lose that critical information.

But I don’t actually think that the dependency on filenames is really what’s annoying him.  It’s just a symptom of a different problem. 

Adrian’s rant is a perfect example of jumping to a solution without first understanding the problem.  And why it’s so hard for Windows UI designers to figure out how to solve customer problems – this example is a customer complaint that we remove filenames from Windows.  Obviously something happened to annoy Adrian that was related to filenames, but the question is: What?  He doesn’t describe the problem, but we can hazard a guess about what happened from his text:

Here’s an example. I might have two files in separate folders called recipe.txt, but one is a recipe for a pumpkin pie, and the other for apple pie. OK, it was dumb of me to give the files the same name, but it’s in situations like this that the OS should be helping me, not hindering me and making me pay for my stupidity. After all, Windows knows, without asking me, that the files, even if they are the same size and created at exactly the same time, are different. Why does Windows need to ask me what to do? Sure, it doesn’t solve all problems, but it’s a far better solution than clinging to the notion of filenames as being the best metric by which to judge whether files are identical or not.

The key information here is the question: “Why does Windows need to ask me what to do?”  My guess is that he had two “recipe.txt” files in different directories and copied a recipe.txt from one directory to the other.  When you do that, Windows presents you with the following dialog:

Windows Copy Dialog

My suspicion is that he’s annoyed because Windows is forcing him to make a choice about what to do when there’s a conflict.  The problem is that there’s no one answer that works for all users and all scenarios.    Even in my day-to-day work I’ve had reason to chose all three options, depending on what’s going on.  From the rant, it appears that Adrian would like it to chose “Copy, but keep both files” by default.  But what happens if you really *do* want to replace the old recipe.txt with a new version?  Maybe you edited the file offline on your laptop and you’re bringing the new copy back to your desktop machine.  Or maybe you’re copying a bunch of files from one drive to another (I do this regularly when I sync my music collection from home and work).  In that case, you want to ignore the existing copy of the file (or maybe you want to copy the file over to ensure that the metadata is in sync).

Windows can’t figure out what the right answer is here – so it prompts the user for advice about what to do.

Btw, Adrian’s answer to his rhetorical question is “the reason is legacy”.  Actually that’s not quite it.  The reason is that it’s filenames provide valuable information for the user that would be lost if we went away from them.

Next time I want to spend a bit of time brainstorming about ways to solve his problem (assuming that the problem I identified is the real problem – it might not be). 

 

 

PS: I’m also not sure why he picked on Windows here.  Every operating system I know of has similar dependencies on filenames.  I think that’s an another indication that he’s jumping on a solution without first describing the problem.

Comments (35)

  1. Nobody says:

    What I would venture is that using the file name as THE identity token of the file in the file system is what is causing this guys trouble. Arguably the name is just metadata about the file, a very important part, but still just metadata, no different than the last write date or the permissions. One could argue that the user should be able to "name" the file whatever she wants, independently on how the OS determines the identity of the file, copying then the two files called recipe.txt to the same folder should then be just a matter of annoyance to the user because she doesn't know anymore which one is which. This could be also extended to the usage of the extension paradigm to "tag" the file type, which should be also part of the metadata not part of the file identity. Even folders could be thought as just mere views of the underlying data then, whether the same folder is in two folders or the file is copied being a bit more natural to express.

    Now the interesting though experiment here is how to design an api to deal with a file system such as this, one would open the file by its id token which would be resolved after the user picks a file in some sort of FileOpenDialog UI. Almost like one imagine the file system internal API must be, after the directory is resolve to the actual entry in the MFT. The apps would now deal with those IDs directly instead of through the "view" of directories and file entries.

  2. @Nobody.  I want to talk about that particular issue in the post after the next one. There are some interesting challenges involving user expectations to that solution.

  3. Weeble says:

    (Please delete this if it's a double-post. The first time I tried to submit it I got no feedback to indicate whether it posted or not.)

    It sounds like he's mostly annoyed at that dialog box. It often doesn't present you with the information that you're actually going to use to make the decision (you may need to open one or both files to do that), it doesn't give you much idea of how many more conflicts are coming up, it doesn't let you defer the decision until you've copied the other files or seen the other conflicts. (I might be wrong – it's a while since I've copied lots of files like that on Windows and I'm on a Linux machine right now – but I think that's how it behaved last time I saw it.) Windows can't make the decision what to do by itself, but it's certainly possible to think of ways that the experience could be less painful.

    That said, I don't think his idea is entirely unworkable. Suppose that we're only talking about user documents – no system files, nothing that's cross-linked by filename or shortcut or anything like that to complicate the picture. Suppose that any time you copy or move a file and there's already a file with the target name it never overwrites the target – you just end up with two files with the same name in the same folder. Could a system like this work?

    It seems like it could. What happens when I have two copies of a file and want to keep just one? Well, I have to delete the one I don't want. That doesn't seem hard to understand. I still need to do some work, but it's my problem, not the system's, and I can do it in my own time. I couldn't use typed in paths to uniquely specify files any more, but perhaps such a system would always require picking files from some sort of GUI – most users get by rarely needing to type in the names of existing files, and when they do it's often to type the first few letters and then select an item in a list view. Applications already make a distinction to users between editing an existing document and creating and saving a new one, so we needn't end up with duplicates every time I open a document, edit it and save it. Now, you couldn't just drop this behaviour in to an existing operating system without breaking pretty much everything, but you could imagine one designed this way from the ground up, at least in its handling of user documents. I wouldn't be surprised if some purely object-oriented operating system tried something similar.

  4. @Weeble: You're describing the Windows XP experience.  The file copy dialog was dramatically improved for Windows Vista.  How do you handle the "copying an updated file from the laptop" scenario if you never replace the existing file (where you *do* want to overwrite the file)?  What about the "updating my media library from home" scenario (where you *don't* want to overwrite the file)?

    Forcing the user to come back and clean up after the copy command can also result in a poor experience.  People would say "@#$@#$ windows, why doesn't it understand that I wanted to overwrite the file?"

    These decisions are tricky, which is why I decided to write the followup post.

  5. Barry Kelly says:

    What's the names of the different photos in your iPad/iPhone/iPod Touch Photos app? What are the names of the files for the notes in your phone's note-taking app? The names of the save-game files? The MP3s in your music library app?

    By framing the question of filenames in a filesystem context, you risk prematurely jumping to conclusions. What if, from the perspective of the user, there is no filesystem? Without a filesystem, you don't need names. Perhaps you need tags, dates, camera model, author, etc. etc. Perhaps these things are more or less convenient. Perhaps there is still a filesystem behind the scenes, but the user doesn't *necessarily* need to know that.

  6. Evan says:

    @Weeble, and a tiny bit @Nobody:

    That whole suggestion sounds like an bad idea to me.

    For starters, I think you're making the common case a lot of work in order to make the uncommon case easier. How often do I want more than one file with the same name in the same directory? Well, it's fairly hard to predict how I'd use that feature if it existed, but I suspect it would be fairly rarely. But how often do I copy a file from one place to another and want to overwrite the destination? Fairly often. And when I DO do it, it's often with a bunch of files at once. Now you're telling me I have to go through and clean up that? You say you can do that "on your own time", but I don't WANT to spend the time on it. I'd lose WAY more time to that, especially when you take into account the occasional mistake, than I do to dealing with the fact that names and files are in a 1-1 correspondence. (Disclaimer: names and files are not actually in a 1-1 correspondence, due to hard links.)

    Now you could use some unique ID that gets created when a file is created and remains unchanged throughout its life, independent of how the file contents change. (E.g. pick a GUID.) This could be an interesting interface. And it resolves this problem at the cost of creating another (preserving identity across what look to the OS like new-file creations; see below.) This may be more along the lines of what Nobody was considering.

    Second, you can't require a GUI — I strongly feel that if it's not scriptable, it's not remotely acceptable. I don't think there's anything that's completely fundamentally wrong about such an interface, but there ARE a lot of questions you have to work out. If I say "type *.txt" at the command prompt, how will that play out in terms of what the shell, type, and OS do? What if I'm using something like Cygwin Bash where the *.txt gets expanded by the shell? [For a few reasons, I generally favor such an interface rather than have programs interpret wildcards.] How will the target program know how to interpret the resulting file names, since they no longer suffice to identify files? I think you'd have to rather completely rethink how program invocation and shells work, perhaps to the point of command line arguments actually representing typed entities. (E.g. *.txt would expand to a list of file objects, not just a list of strings.)

    Then, you say "Applications already make a distinction to users between editing an existing document and creating and saving a new one, so we needn't end up with duplicates every time I open a document, edit it and save it", but this is only true of the user's view. In fact, there are a number of programs for which this is actually NOT true if you look at the actual API calls it makes. Programs do things behind the scenes like 'del file.txt; create file.txt', or 'ren file.txt backup.txt; create file.txt'. Again, this isn't insurmountable; Windows hacks around this problem for some metadata currently. (See Raymond Chen: blogs.msdn.com/…/439261.aspx. Incidentally, this is why you get useful file creation dates on Windows and not on Unix.) However, this hack seems, well, hackish, and I'm not sure how much I'm comfortable depending on it for something vital. In particular, that thing I mentioned in the second big paragraph — give each file a GUID — would absolutely depend on this working nearly perfectly.

    I will be very interested to see the next couple entries though. I tend to get fairly passionate about some aspects of file system design. 🙂

  7. Weeble says:

    It's not clearly a better system, but neither is it clearly an unworkable one. Perhaps a "duplicates happen" system with extra tools for cleaning up or synchronizing files would be more intuitive overall than a "no duplicates" system which forces immediate resolution of conflicts. After all, it does seem to be closer to how real-world objects – such as paper documents – work. It's certainly an interesting idea to consider.

    It's interesting to note that synchronizing files between multiple locations is fundamentally a hard problem. We're not even considering cases where the user really wants the resolution "merge the duplicated files". In that case we would probably say that they should be using some specialised application like a source control system. How do we decide where to draw the line between what should be built in to the file browser and what should be handled by another application?

  8. voo says:

    That dialog was one of the best changes MS made to the explorer I can think of. There are scenarii for all three options (and I could think of a 4th that allows the user to specify a new name when keeping both instead of the default behavior). And I don't see how it doesn't give me all the information I need to make the right decision. Now if someone wants to complain about the XP dialog boxes, just go on, I doubt anyone would want to stop you.

    Also actually I totally DON'T agree that it's a bad thing to have several files with the same name on a disk – readme.txts or config files come to my mind.. it's not just the filename but the absolute path that has lots of information. If I have two files that can be distinguished based on some tag, I can just as easily adjust the filename. Also how would I specify a specific file if there could be several with the same name in one position? Specfiy the distinguishing tag?

    Sounds like he had a specific problem and generalized from it without thinking about the hundreds of scenarios where his "obvious solution" wouldn't work.

  9. Ben says:

    I think that Adrian has a point here (though he's expressed it quite badly, and his solution sucks :)). In many cases, users don't care about the filename. When dealing with the photos on my camera, the camera automatically fills in the date (and depending on which camera I'm using, the place) that it was taken, later on I might tag the people who are in the photo, and give it a category tag, but I still just end up with DSC_0003.jpg as the filename. Because I couldn't put enough information in the name to be useful, it just gets ignored. I never even see my music files, since explorer can just show me the actual song details, I just copy/move them around like that (or my music software, which similarly knows more about songs than a filename could reasonably express).

    With regards to personal documents, the one area that you might care about filenames, I often end up with, say, Budget.xlsx, Budget2.xlsx, Budget3.xlsx: I gave it a name (budget), the other important piece of metadata is the date it was created, and that's stored with the file. The filesystem forced me to make up something to uniqify the filenames, when I could already tell them apart. (My grandmother and mother definitely do this too, so it's not just because I'm a "computer type")

    It seems to me that for data(files related to programs are a different story), the important part of a file is not it's name, but it's identity. The file that started out on my desktop as Recipe.txt, and is then copied to my laptop for further editing, and then back, should overwrite Recipe.txt, while my second recipe that I started working on at work called Recipe.txt, which I then copied to my home computer should go alongside. If I copy both of those files to my laptop, rename one to "Hommus Recipe.txt" and the other to "Guacamole Recipe.txt", when I copy them back to the same directory, they should still overwrite the files that they "came from".

    Having filenames also forces me to actually come up with a name for something, which might not be immediately obvious. I definitely get a heap of files on my desktop or documents folder named Foo.txt, bar.txt etc over time, when I want to save some small snippet of text. (Onenote is great for this though)

  10. Ben says:

    I think that Adrian has a point here (though he's expressed it quite badly, and his solution sucks :)). In many cases, users don't care about the filename. When dealing with the photos on my camera, the camera automatically fills in the date (and depending on which camera I'm using, the place) that it was taken, later on I might tag the people who are in the photo, and give it a category tag, but I still just end up with DSC_0003.jpg as the filename. Because I couldn't put enough information in the name to be useful, it just gets ignored. I never even see my music files, since explorer can just show me the actual song details, I just copy/move them around like that (or my music software, which similarly knows more about songs than a filename could reasonably express).

    With regards to personal documents, the one area that you might care about filenames, I often end up with, say, Budget.xlsx, Budget2.xlsx, Budget3.xlsx: I gave it a name (budget), the other important piece of metadata is the date it was created, and that's stored with the file. The filesystem forced me to make up something to uniqify the filenames, when I could already tell them apart. (My grandmother and mother definitely do this too, so it's not just because I'm a "computer type")

    It seems to me that for data(files related to programs are a different story), the important part of a file is not it's name, but it's identity. The file that started out on my desktop as Recipe.txt, and is then copied to my laptop for further editing, and then back, should overwrite Recipe.txt, while my second recipe that I started working on at work called Recipe.txt, which I then copied to my home computer should go alongside. If I copy both of those files to my laptop, rename one to "Hommus Recipe.txt" and the other to "Guacamole Recipe.txt", when I copy them back to the same directory, they should still overwrite the files that they "came from".

    Having filenames also forces me to actually come up with a name for something, which might not be immediately obvious. I definitely get a heap of files on my desktop or documents folder named Foo.txt, bar.txt etc over time, when I want to save some small snippet of text. (Onenote is great for this though)

  11. Evan says:

    @voo: "And I don't see how it doesn't give me all the information I need to make the right decision."

    There is at least one piece of highly-relevant information that it fails to give you, which it could and make things a lot better some of the time: "these files differ" or "these files are the same". I think this would be a wonderful addition to that dialog.

    The computer guy in me wants a "diff" button too that brings up something like WinMerge, at least for files that look like text, but at the same time I recognize this is probably not particularly appropriate for most people.

    @Weeble: "It's not clearly a better system, but neither is it clearly an unworkable one. Perhaps a "duplicates happen" system with extra tools for cleaning up or synchronizing files would be more intuitive overall than a "no duplicates" system which forces immediate resolution of conflicts."

    I'm still skeptical; there are a LOT of problems that need to be worked out. (E.g. a tools that helps YOU with resolving those conflicts will do nothing for what I was talking about from the command line point of view.) And I think that some of the reasons you might want multiple names could be better handled with other mechanisms, e.g. store the old version of the file in something that's a little like the "previous versions" thing, where you can retrieve it if need-be.

    That said, I definitely like hearing about wacky ideas. And it's a little bit interesting: a lot of the reasons some people are going to Linux and such is because of Windows's ubiquity, to fight against the Windows "monoculture". But from another point of view, Windows is really the odd one out. What OSes are people using today besides Windows? Linux, OS X, Solaris, … all the OSes I can think of that I suspect have a noticeable presence in the world have their roots in Unix, except for Windows. And then lots of people say "MS should toss out NT and build a Windows compatibility layer on Unix, the way Apple did." But in some sense, Windows is the lone standout from a Unix monoculture now. And that has problems too, albeit very different ones than the Windows monoculture. One of the problems is reinforcing a sort of Unix orthodoxy. (I think Rob Pike or someone briefly mentioned this in a presentation somewhere.) Out of all of those OSs, how likely is it that they would have done something like Transactional NTFS before MS? I think not at all.

    So don't interpret my earlier comment as "this would never work" so much as "there's a lot of things that someone would have to figure out how to do to make this work". I probably come off a little more opinionated in text than I actually am.

  12. Weeble says:

    I thought I made it clear but I shall try to restate – the system that I was considering is quite obviously utterly incompatible with existing applications and file systems and would only work with an ecosystem of applications designed to support it from the ground up. It needs a mechanism for programs and scripts to communicate and store file identity other than filenames. It needs a mechanism to distinguish replacing the content of a file and creating a new file with the same name. Obviously it is not practical to retrofit this behaviour into a traditional file-system. That doesn't mean it's not useful to consider it as a theoretical way to manage documents.

    I think we are agreed that it looks like such a system would be awkward when we really want to "copy and replace the equivalent documents" or "copy only the documents that don't have equivalents" or some mix of both. I guess I'm just saying that I'd be interested to experiment with such a system to see how painful this is and if there are other ways to resolve those problems than by using filenames as identities. There seems to be some elegance to being able to say that "copying" a document is just that, no more and no less, as opposed to "copying and replacing". Elegance isn't an end to itself, but I find it can mean something is at least worth a second look.

  13. Igor says:

    I think you missed the point. When you copy a folder with a file named "recipe.txt" over a folder, which already contains a file named "recipe.txt" it would be better if Windows would know, if these files are identical.

    So, if you merge some folders, for example you have already images from your camera in your pictures folder. Now you are on vacation and download some pictures from your camera to your notebook. When you are back, you would copy the pictures folder from your notebook over your pictures folder on your desktop.

    But because you didn't remember, "Ah, I already downloaded image "IMG00032.JPG" to my desktop, before I start my vacation and forget to delete it on the camera, you downloaded the file again to your notebook. And now you will see the prompt, Windows asking you what to do. If Windows would already know, "Hey, these fles are the same…" there is nothing to ask 😉

    The problem might be, that it takes too much time:

    To calculate a checksum, you need to read the entire file. If you then decide you want to copy the file, you need to read it again..

  14. Igor says:

    I think you missed the point. When you copy a folder with a file named "recipe.txt" over a folder, which already contains a file named "recipe.txt" it would be better if Windows would know, if these files are identical.

    So, if you merge some folders, for example you have already images from your camera in your pictures folder. Now you are on vacation and download some pictures from your camera to your notebook. When you are back, you would copy the pictures folder from your notebook over your pictures folder on your desktop.

    But because you didn't remember, "Ah, I already downloaded image "IMG00032.JPG" to my desktop, before I start my vacation and forget to delete it on the camera, you downloaded the file again to your notebook. And now you will see the prompt, Windows asking you what to do. If Windows would already know, "Hey, these fles are the same…" there is nothing to ask 😉

    The problem might be, that it takes too much time:

    To calculate a checksum, you need to read the entire file. If you then decide you want to copy the file, you need to read it again..

  15. Leo Davidson says:

    I posted this reply/suggestion to the original:

    I'm not sure what you really expect Windows (or any other OS; they all act the same in this regard) to do here.

    Why don't you use a decent file manager and have it make the filenames properly unique as it moves them off the camera? For example, have it prefix them with the date & time of when they are being moved. Then they will not clash with existing filenames even if the camera has reset it counter.

  16. Weeble says:

    @voo

    > And I don't see how it doesn't give me all the information I need to make the right decision.

    By this I meant that if I have two files called recipe.txt I might

    need to see the contents of both to make the right decision. Maybe

    they are entirely different recipes and I want both. Maybe they've

    both been edited since they were copied and the last modified date and

    size aren't enough to pick one over the other. And I know I have in

    the past started such an operation, chosen to replace or keep a few

    files and then gotten to a point where I realise there's no good

    answer for some file and I really want to roll-back the operation, but

    by then it's too late. (I don't *think* it's undo-able… or is it?)

  17. Arturo says:

    Think of this scheme:

    1) a file is identified by a GUID based on its content, a file name merely makes it resident in the filesystem (like the UNIX inode system)

    2) an (optional) file name merely ties together the different GUIDs and keeps a reference to (some number of) old versions

    3) some use cases may not require file names at all, but instead rely completely on the metadata to identify the right file GUID.

    You would get version history for free, no more issues with "this file is in use" (the old version can be kept open for reading as long as an app wants), a moved or renamed file can always be found if the last GUID is known, etc.

    Note that this makes file moving and renames absolutely painless. No need to ask for permission with a modal dialog: all operations can be reverted if needed.

    Examples where file names are redundant may include the component DLLs of an application, which may not need to reside in the file system at all. Similarly, shared DLLs could be identified with their metadata only, etc.

    My guess is that present-day computers have more than enough resources to do the indexing required for these operations. If Microsoft is interested in implementing this then contact me 🙂

  18. tobi says:

    This discussion makes me think of WinFS.

  19. Mark Sowul says:

    I suppose two niceties could be to detect when the file contents are identical, and maybe tie in the preview pane so one could examine both files.

  20. JT says:

    The rant in question may have more to do with a pressing deadline and a need for copy than a real grievance. It sounds more like a polemic of convenience.

  21. Richard Gadsden says:

    @Larry, I think the problem with the "copying an amended file from a laptop" scenario is that there really shouldn't be a checksum, but a GUID.  When I copy the file to my laptop, it uses the GUID, when I copy it back, the system notes that (a) it was copied to the laptop on date X, (b) the file on the master system hasn't been changed since date X and (c) these files have the same GUID and therefore does an overwrite.

    The problem here is that you need to have (a) a mechanism to do intentional forking of files (File | Save As ?) and (b) a real conflict resolution mechanism for merging back two files that have both been amended since they were separated.

    Of course, now you're headed towards having a DVCS instead of a filesystem.  But maybe that's what would be appropriate for document files if they weren't binary blobs that the DVCS can't see into and therefore can't do merging properly.

    Certainly, using a VCS or a DVCS for code and a DMS for documents at work has made me regard filesystems as being a bit primitive for user-facing documents.  You'll notice that Google Docs doesn't really emulate a file system.

  22. Nobody says:

    @Evan, yes what I was thinking is that each file gets an ID when they're created and that is how they're manipulated, this ID being independent from the identity as perceived by the user.

    Another of the limitations of the filenames is the fact that you can only use "legal" characters on them, I can't name my file "My 12 work done" because it so happens that the OS decided eons ago that the '' character is the "path separator" whatever that means in a GUI world. Try to explain to somebody that is not very computer savvy that with a straight face.

  23. Miral says:

    It does sound like a useful idea to have a filesystem that inherently knows whether two given files have the same content (eg. by having an internal hash of the file contents, which gets invalidated [but not necessarily recalculated, for performance reasons] on any change to the file); that would enable the "automatically skip files that have previously been copied" scenario, at least if they have the same name.

    Others have proposed having something other than the name being the core identifier, but I'm not confident that would work well.  Perhaps you're copying a document to somewhere else to make changes to it you *don't* want to propagate back to the original document; you'd need to have some way to say "this was cloned from the original but is independent" vs. "this is an updated version of the original", and you don't always know which you want in advance.

    Besides, in my day-to-day job I almost always use copy-and-replace (and I tend to give things sensible names in the first place, so collisions only happen when they really should be replaced).

  24. Josh says:

    > "In many cases, users don't care about the filename. When dealing with the photos on my camera, the camera automatically fills in the date"

    I actually really hate the random identifiers that cameras apply but understand at the moment there isn't a better solution.  I *do* care about the file name, but it is overly burdensome to apply it.  In my ideal world cameras would have a lot better built in tagging capability (other than date, and occassionally GPS cordinates which are utterly un user friendly) and allow me to specify a file name based on the meta data in the same way various utilities let me rename MP3s based on tag info.  At the same time, while Windows is getting better at using meta data it would be nice if it could go further – functionality like Windows Live Photo Gallery really should be built into the shell.

    > "When you copy a folder with a file named "recipe.txt" over a folder, which already contains a file named "recipe.txt" it would be better if Windows would know, if these files are identical."

    In a perfect world sure, but that comes at some real world tradeoffs that are huge.  A 5K text file can be diffed in a matter of milliseconds (especially if you are only checksum diffing) but even that becomes very complicated if you want to start making intellignent decisions based on things like trailing whitespaces, differing unicode quote marks, etc. Expand that out to a 200 meg powerpoint which can have all sorts of very subtle comparisons (office properties metadata, minute differences in specifying formatting between office versions, document change metadata, etc).  Comparing binary identity is easy, but comparing context identity is incredibly hard and not at all processing cheap.  

  25. dave says:

    I wonder if Adrian has children.

    I wonder if he gave them names.

  26. Gabe says:

    I don't understand how you're supposed to use a system where filenames don't have to be unique — how do you know which file you want to open when they have the same name? If I want to open a recipe, I need to be able to figure out which recipe to open. You could argue that a thumbnail of the document would show me which one I want, but I would imagine that most recipes would look similar in thumbnail form. Can you imagine a cookbook where the recipes weren't uniquely named? It would be very difficult to use. You'd have to use meaningless metadata like page number to find a specific recipe.

    Of course you could argue that the user should be allowed to create as many files of the same name as they want, but then most users would end up with hundreds of files all called "Untitled" because it's easier. Then people would be complaining about how hard it gets to use computers because filenames are so often the same and it's hard to know which "recipe" they're looking for. By forcing unique names at file creation time, the computer is saving users lots of pain later on.

  27. dave says:

    An enterprising person could prototype such a thing today. Ignore the actual filenames, and use the file id (MFT record number). Stuff your non-unique filename in a named data stream.   Write yourself a file browser that showed the non-unique name.

    Mind you, I'm not so sure I'd want to use it.

  28. Maybe there should be more options in the file copy dialog. Just looked how Total Commander handles and there is at least two useful option: "Replace all older" and "Replace all shorter". This would probably solve many situations where people synchronize files. Also a visual feedback option might be OK – there are preview handlers in new Windows – why not display a part of the file in the copy dialog – XnView will present you an option like this if you try to overwrite files with same name.

    "After all, Windows knows, without asking me, that the files, even if they are the same size and created at exactly the same time, are different." – he seems to not see many problems here. When are they different? When every letter is different? Or one letter is different? What about binary files? It would result in a totally unhelpful and unpredictable (for a user who haven't read the, probably several pages long, documentation of this *feature*) choices.

  29. Matt says:

    The thing that really annoys me when doing a synchronisation type exercise is that Windows wont let you copy the files across (even if you are going to be ignoring or replacing most of them) if there isn't enough space for the whole lot of data you are moving.

    Say I have 400G of data on a 500G volume, if I copy 410G of data of which only 10G is new and select skip when it asks about overwriting.  The copy is still aborted due to insufficient disk space.

  30. Martin says:

    I'd like to add a thought. Perhaps you should have a look at a situation where this issue regularly occurs: When I take pictures with my digital camera, the system starts with a filename like IMG000001.jpg. It continues to count till infinity as long (!!!) as I don't change the card on which pics are stored. If I do, it starts again at IMG000001.jpg. When I come back from holidays and I store the pics on my harddrive, I have to store the pics from the second card, in a separate folder to not lose half of my pics. As far as I know (and I am no expert), other OSs will recognise that these are indeed different files and store them without regard to the file name…

  31. MWF says:

    @dave: Really all that needs to be done is to ignore the underlying filename in the GUI, instead display names based on a combination of already-present metadata plus additional user-specified metadata (through the new GUI).

    See, it sounds to me like what Adrian is after (mostly) is simply a new front-end; that is, a replacement for or upgrade of Explorer. This "Pretty-Explorer" would display files using metadata, without regard to the underlying filename. A logical extension of how image files can be displayed as thumbnails, or how music files can be displayed as artist/track information instead of raw filenames. For example, with the new system, you might see a folder full of several text documents called "Untitled", with additional metadata – like the date – displayed in proximity (alongside, in a preview bar, something like that). The user doesn't care what the actual filesystem names are, since they operate on the documents.

    This still doesn't fully solve the issue of overwriting versus "side-by-side" copying. One could base the initial "guess" as to the type of copy operation solely on the new set of metadata – perhaps first switch on the specified "file type", using different logic for different files. If the metadata used in the check says the files are different, then they are copied side-by-side, with the underlying file system names being automatically modified, if necessary. On the other hand, if the metadata says the files are the same – perhaps they are both of type "Word Document" and both have the title "Apple Pie Recipe" – then the system would prompt the user as to whether they want the "old" file overwritten, or to simply have the "new" file exist side-by-side after the operation.

    This discussion really raises a lot of interesting UX questions. I feel that we are going down this road, but one must remember that there always also needs to be a way to represent the system as it "really is", for those times when either (a) you have a power-user who can work more efficiently with the "raw" view or (b) you need access to the system to aid in debugging, system recovery, etc.

    @Martin: What OS uses a filesystem that allows for non-unique fully-qualified filenames? (That is, the filename with absolute path, taking into account any case-sensitivity, etc.) I can't imagine how someting like that could even function, unless again the "filenames" you are seeing are not really filenames, but rather metadata displayed to the user when viewing from some front-end.

  32. MysticTaz says:

    I think much of this discussion brings up SOME of the motivating reason for WinFS. I hope that gets resurrected some day.

    As far as the UX, I'd like to see Explorer (and every other program with the same pattern) keep track of "user-resolvable conflicts" as it continues to copy the remaining files that do not conflict. As it hits these conflicts, it should update a dialog containing a listView/dataGrid showing the comflicts, and allowing me to check a radio button of what to do for each, or for a multi-selection. That way, I would make these decisions as all the non-conflicting files continue to copy/move. Then, once I've made my decisions on each of these listed conflicts, I could hit "Apply". Additionally, the context menu for each file/conflict listed would allow me all the same verbs that an Explorer window allows.

    Just my two cents worth.

  33. Nirmal Bhary says:

    I think there should be tab in explorer that will display MD5 of a file and user can sort files having same MD5 and delete the duplicate ones.

  34. I think there should be tab in explorer that will display MD5 of a file and user can sort files having same MD5 and delete the duplicate ones.

  35. hagenp says:

    Maybe the solution for Adrian's real problem would be an explorer option that allows proper batch-renaming of files (then you could use the explorer sort order by date, or by album title or whatnot).

    NB: Windows Explorer _can_ batch-rename files, but only in a very limited way. And for more than ten files it breaks the name sort order (file(1).txt, file(10).txt, file(2).txt, …).