Why does the copy dialog give such horrible estimates?


Because the copy dialog is just guessing. It can't predict the future, but it is forced to try. And at the very beginning of the copy, when there is very little history to go by, the prediction can be really bad.

Here's an analogy: Suppose somebody tells you, "I am going to count to 100, and you need to give continuous estimates as to when I will be done." They start out, "one, two, three...". You notice they are going at about one number per second, so you estimate 100 seconds. Uh-oh, now they're slowing down. "Four... ... ... five... ... ..." Now you have to change your estimate to maybe 200 seconds. Now they speed up: "six-seven-eight-nine" You have to update your estimate again.

Now somebody who is listening only to your estimates and not the the person counting thinks you are off your rocker. Your estimate went from 100 seconds to 200 seconds to 50 seconds; what's your problem? Why can't you give a good estimate?

File copying is the same thing. The shell knows how many files and how many bytes are going to be copied, but it doesn't know know how fast the hard drive or network or internet is going to be, so it just has to guess. If the copy throughput changes, the estimate needs to change to take the new transfer rate into account.

Comments (34)
  1. Matt C. Wilson says:

    This sounds like one of those bad calls by the feature designer. I’m sure the average user doesn’t want to see a byte counter spin up, but back in the day it was nice to see Telix keep me informed of my cps/baud rates and packet count when I was ZModeming the Doom II demo.

    Maybe the copy dialog could have one of those expand-downward buttons, with the nitty-gritty for the more technical users? Or am I the only person in the world who finds packet loss interesting? :)

  2. Frans Bouma says:

    It has a horrible estimate algorithm. There are no excuses. If has to copy 1000 1KB files and 10 1MB files it thinks it will be as busy with the 1 MB file as with the 1KB files.

    The underlying copy routine is just as crappy as the estimate routine for the shell dialog. Try copying an ISO to another partition on the same physical harddisk. Why does it take so utterly long? I can copy it faster from particion 1 to 2 by first copying it to a server on the network, then copy it back. THe reason for this is that the algorithm for filecopy is so badly designed: it always uses the same blocksize for the file to copy. Everybody knows the more stepping is performed by a drive, the more time you will loose. It’s therefore stupid to copy a big file in very tiny chunks, as it does, while there is for example 200 or more MB of memory available for this action.

    I truly hope MS will fix these routines in the future, so big(ger) file copies are done smarter, we’re not on 640KByte memory machines anymore.

  3. James Geurts says:

    Personally, I would like the copy dialog to show estimated total time to copy when copying multiple items. I just copied the entire vs.net setup directory structure frome one machine to another, and the copy dialog kept changing how much time was left based on the folder that it was in. It would display something similar to 30 seconds remaining, then jump back up to 40 minutes remaining when it hit another directory structure. It would be nice if it estimated the time to copy the current file and the time for the entire transaction.

  4. Chris Gervais says:

    You should just remove the time component altogether. How does time really have any relevance? If the copy is going to take a while, won’t the user get the drift when the progress bar fills slowly? And if there are only a few, small items to copy, isn’t a more compact Copy dialog (maybe without the time and progress bar, just the flinging paper pieces) actually more beneficial? Fewer UI items to draw and update and the experience is far less intrusive.

    <sarcasm>Better yet, in Longhorn, to show off the Avalon graphics system that Mac OS X users have today, why not create an amazing visual display of a pipe connecting the copy location to the destination and files being squeezed into a funnel at the start of the pipe (of course showing little Courier ones and zeroes flying down the pipe) and then being reconstituted at the other end? The pipe could pulse and change colors to indicate time required, progress, HD space taken, etc.</sarcasm>

  5. Raymond Chen says:

    Frans: Actually the copy engine bases its progress entirely on number of bytes copied, which is the problem. It doesn’t take into account the other file copying overhead, like creating the directory entry, hunting for a large contiguous region of free disk space, pre-extending the file…

    As for block sizes: If the shell detects that the copy is a file-to-file copy, it uses CopyFileEx. If you don’t like the block sizes used by CopyFileEx, feel free to complain to the kernel team. The shell is the victim here.

    James: The time estimate is indeed for the entire file copy operation. Actually I am personally annoyed that it does that, since it means that there is an annoying "preparing to copy files" step that can take a really long time for complex directory structures.

    Chris: Actually we do that already. Notice that the copy progress dialog doesn’t appear at all for small copies.

  6. Terry Denham says:

    Frans

    If you can copy to the network and then to another partition on the same drive fast then how can the overall copy algorithm be slow?

    The way you described it (from one partition to another on the same drive), sounds to me like you are I/O bound on the drive.

    If drive to drive is fast and drive to network to drive is fast and partition to partition on different drives is fast but only partition to partition on the same drive is slow then it’s not a generic problem with the copy algorithm.

    Have you run perfmon and looked at I/O utilitization while you are doing this to back up your assertion or are you just assuming based on some simple observations?

  7. Terry Denham says:

    As a follow up.

    On my server at home with a RAID-5 7x18GB 10,000 RPM array, file copying is extremely fast.

    So now we have drive to drive is fast, drive to network to drive is fast, RAID-5 array is fast but on Frans’ PC partition to partition is slow so there must be some problem with the generic algorithm.

  8. Dave says:

    Copying across partitions on a single physical drive causes a boatload of seeking, doesn’t it? Whereas copying to/from network shares or other physical drives can do continuous reads/writes. Increasing the copy buffer size would probably help a fair amount, but even so I’m not surprised that would be slow.

  9. steven says:

    now… why does XCopy just die with "Access denied" instead of giving the slightest hint of *what* access (read from the original, write to the destination) or a chance to skip that file and continue?

  10. Raymond Chen says:

    xcopy is an ancient program. There is I believe a /C switch to skip problem files and continue. But in general it’s one of those terse command line programs.

  11. Larry Osterman says:

    XCOPY is so bad because when IBM (!) wrote XCOPY back in 1983 they didn’t care about robustness and reasonable diagnostics – they weren’t a goal 20 years ago.

    Here’s a smidge of history of XCOPY (you did bring it up).

    Back in 1982 when IBM first licensed MS-DOS from Microsoft and rebranded it as PC-DOS, part of the deal was that IBM authored many of the system utilities. For example, the MODE and XCOPY utilities were written by IBM and included on PC-DOS.

    For MS-DOS, there was no such utility, it was up to the MS-DOS OEM to write them (just like the MS-DOS OEM had to write their own version of the DOS BIOS (IO.SYS, called IBMBIO.COM on PC-DOS).

    Many of our OEMs, especially ones like Compaq (who insisted on having perfect 100% clones of PC-DOS) hated this – they had built 100% compatible hardware to IBM’s, their system ROMs were totally compatible with IBM’s, they could run IBM’s PC-DOS out-of-the-box without a problem, so they wanted to license a copy of MS-DOS from Microsoft that was totally identical in functionality to IBM’s.

    So Microsoft decided to oblige them. We took IBMBIO.COM (which was written by Microsoft for IBM), scrubbed it of references to the IBM ROM (I was the dirty half of that process, a new hire with no PC experience was the clean room half)). Then we started working on replacements for the MODE, XCOPY and other commands (I don’t totally remember which ones we did, but MODE and XCOPY were the big ones). The combined product was called DOS 3.2, and was the first "packaged product" version of DOS from Microsoft – Microsoft shipped a box to OEMs that they could then stick in the OEM’s box.

    Shortly after DOS 3.2 shipped, Microsoft and IBM signed their joint development agreement, which gave Microsoft the rights to all the IBM authored utilities, so the Microsoft authored versions of the IBM utilities were scrapped in favor of the IBM utilities, and MS-DOS 3.3 was the first version of DOS that was written with true 100% compatibility with IBM’s offering (there’s a humorous story there too but this is already way too long).

    For NT, we needed a 32 bit version of these utilities written in C (MODE was written half in C half in assembly, similarly xcopy was a mingling of multiple languages). So the NT utilities team went off and re-implemented a version of these utilities. But because of compatibility constraints (we couldn’t break customer batch files) we were precluded from CHANGING any of the xcopy semantics.

    The XCOPY team added a boatload of features (/d and /c, for example) but the utility is STILL 100% command line compatible with the PC-DOS 2.0 version of XCOPY.

    Humorously, many of the options that the XCOPY team added were actually added to gain feature parity with an internal tool TC.EXE that was widely used by the development team.

  12. Alex Feinman says:

    I wonder if you also have an explanation to a ridiculously large number popping up once in a while in the "estimated time" area – things like 27459540 minutes

  13. Jack Mathews says:

    Alex: That would be the 4GB 32-bit number limit cropping up. I’m pretty sure updates have fixed that for Win2k and later.

  14. Alex Feinman says:

    Jack: I wish it were that simple, but no – this kind of behaviour is exhibited even by Windows XP. And I know for a fact that at least some of the copy operations when I saw this involved less than a GB of data

  15. Peter Montgomery says:

    Raymond,

    In one of your replies, you wrote, "If you don’t like the block sizes used by CopyFileEx, feel free to complain to the kernel team. The shell is the victim here."

    Fine, but isn’t that really sort of the problem? In other words, how exactly does the average use get to complain to the kernel team? We can’t as far as I know. It makes users (who may also be programmers) feel helpless when they feel they have something to contribute. Is there a mechanism at MS to actually contact folks like the kernel team that I am unaware of? Hopefully something better than a "suggestions@ms.com" sort of email dump tank.

    Thanks,

    PeterM

  16. Julian Gall says:

    The main perceived problem seems to be the way the estimate changes so rapidly. I see no reason why there could not be a smoothing function so that changes in the estimate that take place in a short period of time (compared to the total estimate) do not affect the display immediately.

    If anyone remembers, cars used to have fuel gauges that rsponded so fast that turning a corner would cause the gas to slosh to one side and the reading to change. Now they are damped so that the needle is rock-steady.

  17. "Now somebody who is listening only to your estimates and not the the person counting thinks you are off your rocker. Your estimate went from 100 seconds to 200 seconds to 50 seconds; what’s your problem? Why can’t you give a good estimate? "

    This passage makes me think of what it’s like to try to give an estimate to a client who cannot pick a set of requirements and stick with them. Sorry for the digression, but the mental image was very potent…

  18. quanta says:

    Having written progress dialogs in VBA in the past, I can sympathize. However, I can’t think of any other program with such wonky progress statistics as Windows. For example, the Bit Torrent client or Mozilla download dialog, under fluctuating transmission speeds, does not seem to have such a wide delta when reporting Time Remaining.

  19. Larry Osterman says:

    Chris,

    Your description of how CopyFile SHOULD work is hideously niave (I know I can’t spell). Head stepping only is relevant when the files are 100% contiguous on disk, and even then it’s not clear where on the disk they physically lie (SCSI disks lie about the physical location of sectors, for example).

    This is a grotesque simplification, but CopyFile simply opens the source and destination files, maps a section of the source file into memory, then calls WriteFile() to the destination file specifying the address of the section. This means that CopyFileEx uses memory management to read in the pages from the file (pre-reading if possible) and writes it out through the filesystem cache (which is also backed by MM).

    As a result, if the files are on different spindles, throughput can be quite efficient, if they’re on the same spindle it can thrash. But there’s no real way of optimizing this experience, unless the copy operation is performed inside the filesystem itself – because only the filesystem knows the structures on the disk and how best to "optimize" them. But CopyFileEx is a 100% user mode API – it has to be able to copy between different volumes, between different filesystems, between any two file stores.

    It turns out that there IS something that can be done with CopyFileEx to make it better, that’s using a single instance store (Bill Bolosky (http://research.microsoft.com/os/bolosky) has done a great deal of research on this).

    In that case, copies are actually done by adding a reference count to the file and marking the file as CopyOnWrite – the instance you write to one of the instances, is when the copy actually occurs.

  20. McGroarty says:

    Directories are the largest added complexity.

    Directories don’t have an attribute expressing the size of their contents. To get the size of all files being copied would involve walking directories, which would add to the copy time. Instead, gross estimates are made.

    As an aside, for one project where the remaining time was utterly incalculable but where the client demanded a progress bar nonetheless, I implemented a progress bar which merely advanced 5% of the remaining time every tick. The bar would roll across leisurely at first, then slow and slow toward the end, never technically getting there.

    No user ever complained, because it seemed to be at least consistent. When I probed a bit deeper on this, one user swore she could time whether she had time for her coffee breaks by it.

  21. Phaeron says:

    Larry,

    I think "hideously naive" is a bit harsh. The truth of the matter is that CopyFileEx() is, in fact, slow on a intra-drive copy of large files because of seek overhead. On the machine I’m at right now, I get about 2MB/sec throughput with CopyFileEx() on Windows XP, because the VM buffers up too many dirty pages and then tries flushing them during the reads, resulting in a lot of seek traffic and small-size I/O. If I use a specialized copy with unbuffered I/O and a 4MB buffer, I get 7.5MB/sec. This is 100% user space code that can easily fall back to a CopyFileEx-style mode if necessary and as a bonus doesn’t pollute the disk cache or swap programs out of memory.

    Now, to be fair, this high-performance disk copy tends to monopolize the disk and is very unfriendly to other applications, behavior which would be undesirable as general behavior for Windows Explorer. To say that there is no real way to optimize the Explorer copy experience, however, is false.

    Also, while a file copy routine has to deal with fragmentation on read, it shouldn’t really have to on write — the filesystem can be told beforehand how large the file is and otherwise it can still allocate contiguous spans on flush, as XFS and ReiserFS do. Even the VFAT driver on Windows 95 preallocates 512K at a time if possible.

  22. Terry Denham says:

    Doing some basic analysis on my server at home copying a 644 MB file (675,315,712 bytes)

    C: = 1×9 GB 7,200 RPM SCSI

    H: = 7×18 GB 10,000 RPM RAID-5 (2 MB cache card)

    E: = network drive 100 Mb network

    From To Time(min) Rate Adjusted Rate

    H H 5:39 1.9 MB/s 3.8 MB/s

    H C 1:36 6.7 MB/s

    C H 3:34 3.0 MB/s

    H E 1:28 7.3 MB/s

    C C 3:54 2.7 MB/s 5.5 MB/s

    The difference in time for H->C and C->H since h is RAID-5 reads are very fast but the parity incurs some cost in the operation that is not paid for read operations. So taking the difference between H->C and C->H yeilds a 1:58 time difference. So there is a cost of 1 minute 58 seconds incurred by the parity calculations for a write on the stripe set.

    Notice that the network time on a 100 megabit networks is about the same time as raid to single scsi drive.

    When you take into account that H->H (and C->C) actually had to deal with 2x the amount of data for the same drive (read/write) and we figure about a 1:58 time cost for the stripe with parity operation this yeilds about the same xfer rate as the other operations.

    The time to copy from c to c is also interesting when you compare it to H->C. H to C operation was constrained by the speed/xfer rate of the C drive not the H drive (since it’s striped) so C’s max throughput is about 6.7 MB/s. Also when we look at C to H and C to C take about the same amount of time but H to C is so much more faster than either of these (owing to the stripped drive).

    I’m sure other conclusions could be drawn from this simple example but I think it shows that it’s not a generic problem with the CopyFileEx algorithm.

  23. Larry Osterman says:

    Man, I’m hijacking Raymond’s blog tonight – Sorry :)

    CopyFileEx WILL extend the destination file on write, and that helps the write time (and tends to make files contiguous).

    But that’s not enough. Remember this is a 100% user mode API. It’s not in the filesystems, it’s not in the I/O subsystem. It’s in USER MODE.

    You’re right that writes to the copy destination can interfere with reads if they’re on the same spindle, but for the general case (and CopyFile is optimized for the general case) it’s pretty darned good.

    And as you pointed out, optimizing for big files has other rather severe negative results.

    On the other hand, your copy routine probably doesn’t preserve ACLs on files, and alternate data streams either :)

  24. Terry Denham says:

    Also H and C are on different controllers so I’m not constrained by any channel saturation and the PCI bus is no where near being saturated.

    On my workstation with a a C->C copy of the same file with a 40GB 7200 RPM IDE drive with 2MB drive cache the file took 2:26 to copy the file. This is better than my server SCSI C->C copy probably due to more cache on the IDE drive than the older baracuda drive I have in my server.

  25. C’mon Larry, just tell us this (may be long) humourous story about IBM and XCOPY…

  26. Anonymous says:

    SH from Denmark sent this status bar he saw while installing Microsoft Encarta 2004. Above the status bar reads this text: Please note that the progress bar may reach the end before all files have been copied. Please be patient.

  27. KC Lemson says:

    See http://broken.typepad.com/b/2004/06/microsoft_windo.html… A bug, or an actual estimate based on the current speed of the copy?

  28. Raymond Chen says:

    Commenting on this article has been closed.

  29. Furrygoat ponders the reasoning behind time remaining in file transfers and program setups. Reymond Chen provides an answer of sorts, but I find the answer unacceptable. Why predict time remaining on anything if you can’t even ballpark it within some reasonable timeframe. I’ve had downloading experiences result in timeframes in negative percentages, like the screenshot below. What the hell does that mean to an average user? Or even a not so average user? When a download is at -NNN% does that mean I’m giving the download site bits? Maybe this is just a problem to be addressed in the Longhorn timeframe; that’s a popular excuse for bad design….

  30. Sync2Play says:

    Furrygoat ponders the reasoning behind time remaining in file transfers and program setups. Reymond Chen provides an answer of sorts, but I find the answer unacceptable. Why predict time remaining on anything if you can’t even ballpark it within some…

Comments are closed.