It’s more efficient when you buy in bulk


The Windows XP kernel does not turn every call into FindNextFile into a packet on the network. Rather, the first time an application calls FindNextFile, it issues a bulk query to the server and returns the first result to the application. Thereafter, when an application calls FindNextFile, it returns the next result from the buffer. If the buffer is empty, then FindNextFile issues a new bulk query to re-fill the buffer.

This is a significant performance improvement when reading the entire contents of large directories because it reduces the number of round trips to the server. We'll see next time that the gain can be quite significant on certain types of servers.

But it also means that the suggestion of "Well, why not ask for 101 files and see if you get an error" won't help any. (Actually I think the magic number was really 128, not 100, but let's keep calling it 100 since that's what I started with.) The number 100 was not some magic value on the server. That number was actually our own unwitting choice: The bulk query asks for 100 files at a time! If we changed the bulk query to ask for 101 files, then the problem would just appear at the 102nd file.

Comments (29)
  1. PM says:

    So what about sending two bulk queries each only requesting one record in order to find out whether the server is buggy? Or is the number of records one bulk query retuns "hardcoded" into the protocol?

  2. BryanK says:

    Sorry about the double-post, but I just realized that the 2-extra-round-trip cost could be cached as well, so it wouldn’t be imposed as often.  Keep a one-week (or whatever) blacklist of "the server running at this IP address doesn’t support fast queries" in memory, with a size limit if necessary.  (Not on disk, because something should be able to clear it, and rebooting sounds as good as anything else.)  Add an IP to that list when the first two fast-mode queries return without an error.  (Entries would clear out of the blacklist after some period of time too, on their own.)  You could have a whitelist as well, if you wanted.

    And before doing a directory query, check the list(s).

  3. 8 says:

    BryanK, would the hassle of having such bookkeeping be worth it? How many times is a connection made when someone browses a server?

  4. We’ll see more about the cost of round trips tomorrow.

  5. Anthony Wieser says:

    Does the problem exist when you ask for differing numbers (if that’s allowed, which it may not be–I don’t know the details).

    Say you ask for 100, you say that if you ask for the second 100, it fails.  What if you ask for 99 next, does that fail too?

    If not, there’s your solution.  No need for a workaround, other than asking for slightly variable numbers of files each time, but it works in all cases.

    Otherwise, if bulk mode always fails, irrespective of size, why not ask for just 2 files for the first hit, and then 100 for the next.  If it fails on the second request, you haven’t lost much in retracing your steps, as you only ask for the first 2 files again before you can start returning results.

  6. I think we’re all going to look like idiots on this issue as Raymond keeps trickling out more information about the problem.

  7. I’m not trickling out the information on purpose. I’m learning this stuff as it goes. On my own time – not part of my job. This isn’t my bug so my status is as a bystander.

    I don’t know whether there is a minimum "bulk request" size. It might be a variable-sized return. I don’t feel like digging deeper into SMB right now. Believe it or not, studying the intricacies of network protocols is noy my hobby.

  8. Alun Jones says:

    Don’t lose sight of the idea that your solution should minimally penalise users, whether they use the working version or not.

    That means that when running against the best case – a working server – your fix should be indistinguishable from an unfixed piece of code that assumes that the server works.  When running against the worst case, you should do what’s minimally necessary to fix it.

    Maybe the answer is to do similarly to what OpenSSL does – send a zero query when opening the connection, so that you can trigger on the first non-zero query.  Maybe you can’t do that, though – maybe every query has to be for the same number of files.

    Network latency, too, is important to avoid.  Unnecessary round-trips are bad.

  9. Anonymous Coward says:

    I’ve worked extensively with SMB as well as other Microsoft protocols (eg DCE/RPC).  Since Microsoft has traditionally implemented these in the kernel, they are obsessed with buffer sizes in the scarce kernel memory.  And then a few years later someone decides that 4k maximum sizes are too small for performance, and adds another tweak where they can be large under some circumstances (eg 8k).  Do that a few times and you end up with a huge mess of seeming arbitrary rules about how big things can really be.  (Windows NT also introduced alignment requirements so random amounts of padding got thrown in as well).

    So we finally end up with a protocol has 5 or so different read commands, 5 or so different write commands and IIRC 8 different ways to open a file!

  10. Mark Steward says:

    Almost H. Anonymous, don’t go blaming Raymond – all the information was there (http://blogs.msdn.com/565878.aspx#566009)!

    As mentioned, the magic number is 128, but it can be changed by the client, and depends on the buffer size anyway.  The only thing I didn’t make clear was that requerying the directory is hard coded as 100, so it’s difficult to make that useful.

    Or perhaps I didn’t make anything clear, as nobody seems to have picked up on it.  Not bitter, just amused that people kept postulating without the info (hidden) in the comments ;-)

  11. Adrian says:

    Buying in bulk is much faster, but it can also be more complex since there’s a bigger window for synchronization problems.

    What happens when some other client of the file server changes the contents of the massive directory while the OS is holding a buffer with half the contents?

  12. DmitryKo says:

    "suggestion of "Well, why not ask for 101 files and see if you get an error" won’t help any. That number was actually our own unwitting choice: The bulk query asks for 100 files at a time!"

    What about breaking all the queries in two – if you get an error on the second one, requery in slow mode…

    I still think Microsoft shouldn’t try to fix 3rd-party bugs… so the management will probably decide to revert to slow mode queries, and fast mode as default will be postponed until at least version 2010, so to make suret that all the bad versions are dead by natural causes :)

  13. Wesha says:

    So basically you’re admitting that you mislead us. You said "the error happens after you read 100th file" and now you’re saying "no, it doesn’t happen after you read 100th file; it happens after kernel does its first bulk download". So while it needs to be solved on the kernel layer, you ask us how to solve it on the application layer!

    That’s called remedying the sympthoms, not solving the problem.

  14. I misled myself. I didn’t know about this bulk query stuff until yesterday.

    And I don’t recall ever insisting that the solution be at the application layer.

  15. Anony Mouse says:

    Valuable lesson here: never assume you know enough to solve a problem, and when asked for your opinion on which option to choose, the correct answer is always to fully investigate the problem.

    But I still say the flaws in the protocol are depressing.

  16. BryanK says:

    So the problem always shows up on the second round-trip to the server when the server’s running the broken version of Samba?

    Sounds like it would have been *much* easier and cheaper to detect than I was thinking.  (I was thinking you’d need 100 round-trips, or whatever, before you’d see the problem.)  Given this, I’m starting to agree with a bunch of other people saying that the network redirector should handle the problem (since that’s likely where the 100 number was put as well).

    As PM said, have the network redirector ask for the first 2 entries individually in fast mode.  (2 round-trips)  If you get the error, then start over and do a slow query for a block of 100, returning them one at a time (just like XP).  If you don’t get the error, then return the first two entries (one at a time) and start fast-querying for anything else (in blocks of 100, returning them one at a time).

    Yes, the first 100 results will require 3 round trips (2 for the first 2 results, and 1 for the next 100), instead of 1.  I’m not so sure that’s a huge deal, especially when you consider that "most" Samba installations are likely only accessed over the LAN, i.e. via either 10/100 or gigabit Ethernet.

    (In short: Put the fix even lower down in the stack than the FindFirstFile/IEnumIDList::Next/whatever functions sit.)

  17. Mark Sowul says:

    The same thing that happens when you have the directory listing in a window and it’s changed afterward?

  18. random person on the internet says:

    "As PM said, have the network redirector ask for the first 2 entries individually in fast mode.  (2 round-trips)  If you get the error, then start over and do a slow query for a block of 100, returning them one at a time (just like XP)."

    This is not a solution. It doesn’t matter if the number is 100,128, 1000, that’s not the point.  What will you do if a few issues like these appear in the following months? Add tens of workarounds and improvisations ?

    Some applications may check by themselves if the servers are bad or not, in which case – if they use FindNextFile and other functions – I guess there will be a lot of additional, pointless network traffic. I don’t think it’s wise to workaround this bug and not offer an option to at least provide an option to disable the workaround.

    Windows should work according to STANDARDS, optionally allowing users to specify if they’re working with bad servers or not, through an option in the network connections applet or some system tweak.

    If the bug will be fixed in a few months, additions that may slow down the code will be useless.

    Just my two cents.

  19. foxyshadis says:

    If windows should work according to standards, why does linux include so many workarounds for undocumented, underdocumented, and buggy protocols? (MS, netware, and other unix included.) No one would use an OS that doesn’t work over one that does, for a simple sense of purity, if that was the only differentiator.

  20. BryanK says:

    8:

    > BryanK, would the hassle of having such bookkeeping be worth it?

    I don’t know.  It probably somewhat depends on the answer to your second question (how many connections are made when someone browses a server).  But the internal bookkeping is only a performance hack anyway; it’s not required to get a correct directory enumeration result.  With or without it, the fix/hack/whatever will return a correct result.

    oldnewthing:

    > We’ll see more about the cost of round trips tomorrow.

    True.  

    (After some thought over the last few days, I think I agree with the people that have said that "a partial directory listing is *always* wrong".  (Though I don’t think an inconsistent directory listing is always wrong; that problem exists now.)  End-user confusion aside for the moment, the caller of the APIs is not expecting to get an error (or a "this list is empty" result) partway through.  I know, my first response to the initial post indicated I’d rather keep the compatibility hack out; I’ve since changed my mind after learning more about the problem’s cause, the other software involved, the ways Samba’s being used, etc.  If NAS boxes didn’t use it, but rather only Linux boxes with real admins that can do upgrades of userspace stuff almost-at-will, then I’d still say "keep the compatibility hack out, by the time Vista’s released, almost nobody will have the old version of Samba installed anymore".  But that’s not going to happen.)

    I now believe that there are only 2 possible solutions for this: either disable fast mode completely (with or without an option for the admin to turn it back on, preferably with), or detect the bug then work around it, with the 2-round-trip latency required by that solution.  The "right" choice between those two will depend on their relative costs, which I don’t completely know; you’re saying you’ll get into some of that tomorrow.

    The decision on whether to always use slow mode (simply acting like XP) versus attempting fast and then reverting to slow after 2 queries will depend on the tradeoffs: specifically, whether the speed increase from fast mode outweighs the speed decrease that happens with the 2 extra round-trips.  At the moment, I believe that’s something that only Microsoft knows.

  21. memodude says:

    How about doing a bulk query for the first 128, then doing a bulk query for the second 128, and if you get the error on the second 128, blacklist fastmode for that server and do a bulk query for the second 128 again in slowmode, keeping the first 128 results from the fastmode query?

  22. Dean Harding says:

    What will you do if a few issues like these appear in the

    > following months? Add tens of workarounds and improvisations?

    Of course! Such is the life of an operating system. They all have thousands of workarounds for buggy hardware and periferals. Take a look at the source for the linux kernel – it’s rife with them. Windows is no different.

    Personally, I quite like the small up-front query followed by large subsequent queries. Either that, or the "there must be something specific in the Host field" (or whatever that field is). It doesn’t need a specific value, but maybe you just start it with a version number or something. Some way for Windows to detect that the server supports fast queries. There’ll be some Samba implementations that can do fast queries, but won’t – but not many, and let’s face it: if the Samba guys can fix the original bug in 3 minutes, then surely it’ll take even less time for them to update this field! The only problem would be if Windows XP (and earlier) actually supported fast queries on the server as well, but just didn’t do it on the client. You’d have to wait for a service pack to update them.

  23. Btw, as a rule of thumb, the most expensive thing you can do in networking is a round-trip (this is not true for dial-up, but is for wired networking).

    So the cost to round-trip around 10 bytes is roughly the same as the cost to round-trip around 1500 bytes, which is not too different from the cost to round-trip 64K bytes (the latter can take more due to window size issues, it gets complicated).

  24. josh says:

    "Valuable lesson here: never assume you know enough to solve a problem, and when asked for your opinion on which option to choose, the correct answer is always to fully investigate the problem."

    Unless you have a deadline.  Or somebody asks if you want to go to a nursery.

  25. Worth noting that you can reduce the round-trip cost by asking for 1, then 100. Or even avoid the (overall) cost entirely in the non-buggy case by asking for 100 at a time as usual, but asking for two batches up-front. If the second batch gets the error, then you redo the whole thing in slow mode.

    And you do all of this before returning *any* results back to the application.

    This means the only extra overhead in the non-buggy case is a slight delay in returning the first results.

  26. Jen Kilmer says:

    > I misled myself. I didn’t know about this bulk query stuff until yesterday.

    Raymond, rent Dogma.  Gods don’t make mistakes, remember?

    *ducks*

  27. L7 says:

    > "Valuable lesson here: never assume you know enough to solve a problem, and when asked for your opinion on which option to choose, the correct answer is always to fully investigate the problem."

    Unless you know you cannot have more information.

    When someone posts a question on a blog (like when asked in a write exam) all information needed should be contained there. Telling us we should have answered "we need more infos" is not a lesson, is playing mind tricks to make other feel stupid.

  28. Wesha says:

    You see, I’m a consultant, so it’s totally no news for me that the problem as described by the client has nothing to do with the real problem. But I thought that since Raymond is a developer himself, he should’ve known better. So I let my barriers down and gave the information he provided the same level of trust as if I collected it myself.

    Which turned out to be my mistake. Sorry. Won’t happen again.

  29. Dave Harris says:

    Worth noting that you can reduce the

    > round-trip cost by asking for 1, then 100.

    It’s better to do the big transfer first. In other words:

    (1) Allocate space for 100 files.

    (2) Request 99 files.

    (3) If (2) recieved 98 or fewer, you’re done. Return 98 to caller.

    (4) Otherwise request 1 more file to go after the 99 you have.

    (5) If (4) succeeds, you’re done, and the server supports fast mode. Return 100 (or 99) to the caller.

    (6) Otherwise throw away the 99 files you already have, switch to slow mode, and query for 100 files.

    This doesn’t require any extra buffer space. In many cases the directory will have fewer than 99 files, and then there’s no network penalty at all, on neither correct nor buggy servers.

    If there are 99 files or more then correct servers have an extra transaction, but this is amortised over at least 99 files so it is a bit less painful. Probably there are a lot more than 99. Incorrect servers have several extra transactions but at least they work.

    Keeping track of known-correct servers can be done in addition, if desired.

Comments are closed.