The wrong way to determine the size of a buffer


A colleague of mine showed me some code from a back-end program on a web server. Fortunately, the company that wrote this is out of business. Or at least I hope they're out of business!

size = 16384;
while (size && IsBadReadPtr(buffer, size)) {
    size--;
}
Comments (62)
  1. bahbar says:

    Ugh. My eyes! The goggles, they do nothing!

    Are you sure you want to keep that code as is ? Think of the copy-paste-happy coders out there.

  2. Marquess says:

    Wow, just wow. This is wrong in at least three different ways …

  3. Matt Fisher says:

    They could’ve at least done a binary search…  :P

  4. Robert says:

    Ooow.  oOOoooww.  OOOOOWWWWWWWW.

  5. Mc says:

    The company might be out of business but the programmer may still be at large in the community.

  6. William H. says:

    Raymond, please, explain what’s wrong with that code.

  7. David Walker says:

    There are at least two errors:  One is that IsBadReadPtr should never be used.  The other is obvious.

  8. JS says:

    IsBadReadPtr() will only tell you whether the page is valid for your process (i.e., whether it can be read without an access violation).  It will not prevent you from trashing other buffers belonging to your process and causing a crash or worse.

  9. ScottF says:

    <i>Raymond, please, explain what’s wrong with that code.</i>

    And

    4) What if the buffer is less that 16k?

    That doesn’t mean that IsBadReadPtr will fail.  Think if you have two 8k buffers that are allocated sequentially.  Pass in the first buffer and this function will tell you it’s 16k.  Boom buffer overflow.

    In reality it’s unlikely that the memory past your buffer isn’t readable.  Maybe it’s unallocated on the heap, farther down the stack, etc.  This code will probably return 16k for almost every pointer passed to it.

  10. GWO says:

    Which leaves the obvious question: can what this code is trying to do be done with the Win32 API? Even under limited circumstances (e.g. the pointer must’ve been returned by malloc())?  In standard C / C++, I’m certain it cannot (if the buffer is just a pointer) but implementations need to store this information for an efficient free().  Do any of them export it for the user?

  11. Plompan says:

    This is a terrifying example that shows exactly why universities do the entire software engineering profession a disservice by providing CS degrees.  Most programmers would be aided immensely by studying software engineering instead of computer science.  I’m with Spolsky on this one.  The guy who wrote this code was probably a low-to-middle of the class CS graduate.  Knows big-O, when to use a list vs. a hash vs. an array, understands recursion at a basic level…but clearly can’t write quality code in a real-life environment.  

  12. Gabe says:

    I should hope they’re out of business! It’s much more efficient to just start from zero! In fact, since the running time is dominated by page faults, this will probably be faster than even a binary search because it only incurs a single fault:

    size = 0;

    while (size < 16384 && !IsBadReadPtr(buffer, size)) {

       size++;

    }

  13. Kyle says:

    Gabe, if that’s not sarcasm, then God help the quality of the software produced in the next 20 years.

  14. stumpy says:

    @Gabe: Your logic skills astound me.

    1: This code is bad

    2: I personally think no CS graduate can code

    3: This code must be written by someone who can’t code

    4: therefore it must have been written by a CS graduate

    5: And this proves that no CS graduate can code. q.e.d.

    There are a good handful of flaws in this code. But there’s just as many flaws in your logic. And before you rant too much about others being bad programmers, I’d just like to interject that a basic understanding of logic tends to be important to a programmer. Perhaps you should have studied CS after all?

  15. Kyle says:

    stumpy, while I would agree that there are plenty of poorly trained CS graduates, there are plenty of them (including myself) who seriously believe in reading documentation and actually making an attempt to understand the technology behind the API calls.  Blanket statements about CS not teaching anything are faulty at best.

  16. J says:

    @Plompan:  Let’s abolish universities and replace them with trade schools so we’ll never see this type of code again.

  17. Random832 says:

    @ScottF "Pass in the first buffer and this function will tell you it’s 16k.  Boom buffer overflow."

    Not to mention even if you have only one buffer, on x86 you’ve got the rest of the already-used stack (i.e. the return address and such, exactly what buffer overflow exploits target) after it in memory.

  18. Anonymous says:

    @Mike Dimmick

    _msize() is garbage.  If I saw this in a code review it would be a huge red flag and I would no longer trust that developer to do anything reasonable.  The real way is to track the size of your buffers yourself, not rely on some silly non-standard malloc extension to free you from your own laziness.

    The fundamental issue here is that people who don’t know C very well think that there’s a way to determine a size given a void*.  There isn’t.

    When people want to know how, they should be told this.  Not some sissy answer like, "oh, but you can non-portably check if the pages are mapped into your process", or, "oh, if you’re using Visual Studio, you can ask the malloc implementation".  As has been mentioned, these techniques have their flaws.  Much worse, saying that these techniques are OK only encourages people to write bad code.

    If people want to work in C, they should learn how to write C code, and the common idioms, such as keeping track of the length of your buffers.  If they are unwilling to do this they should explore another language before thinking of what crazy hack they can do to work around the gaps in their understanding.

  19. John says:

    @Gabe: Nice!

    Come on, people; it was obviously sarcastic.

  20. Timothy Byrd says:

    Hey Raymond,

    Now that code must be good because it isn’t in italics, right?

    — T

  21. Tony Cox [MSFT] says:

    I’m actually less concerned about the person who wrote this code than I am about that person’s manager/mentor.

    Junior developers do dumb things. That doesn’t necessarily make them stupid or bad people, it just makes them junior. I’ve seen all sorts of similarly misguided code from junior developers who simply didn’t know any better. The key is whether you have a process in place that (a) prevents stuff like this from shipping, and (b) trains developers so that they don’t make this kind of mistake in future.

    The majority of training should be on-the-job and is often ad hoc. A code snippet like this is a great “teachable moment” for a manager/mentor to use with a more junior developer. Use of the Socratic Method with an example like this, letting the junior developer figure out for themselves (with some guidance) why this code is bad, would be a great lesson. They’ll learn something about memory layout, paging, race conditions, competing concerns of fault tolerance versus fast failure, among other lessons. And it will stick with them because you worked through an actual example based on code they wrote rather than simply being an abstract lesson in a textbook. They’re more likely to absorb the general principles and apply them later.

    The fact that this code made it into production says more about poor management than it does about the (undoubtedly) junior developer who wrote it.

  22. Patrick says:

    Thanks, fixing my code now as we speak.

  23. pulp says:

    @bahbar: too late, I already copied it

  24. Absotively says:

    Crescens2K:  Gosh, thanks.  It had never occurred to me to track the size of a buffer.

    Next thing you know, you’ll be suggesting something radical, like making sure to free memory that you’ve allocated.

  25. alephnaut says:

    @stumpy,

    When a pipe bursts in your house you don’t hire a metallurgist to fix it you hire a plumber.

    When your car breaks down you don’t hire a mechanical engineer to fix it, you hire a mechanic.

    When you need to build a house you don’t hire a materials scientist, you hire a builder.

    When you need to build software why on earth would you hire an electrical engineer?  Or a computer engineer?

    Unless you’re building electrical components (for EEs) or computers (for CEs) hire the right degree for the job.

    During undergrad CS I took a course on electronics but am certainly not qualified to build a city’s powergrid.

    Just say no to computer programming by anyone other than CS or SoftEng grads.

  26. Kyle says:

    Tony hit it right on the money.  This should have become a teachable moment for an inexperienced developer, not production code.

  27. Nik says:

    Raymond, your post is already #3 in Google results (#11 on Bing) for "determine the size of a buffer".

    So it is virtually guaranteed that this code will be copied and used dozens of times by eager young programmers all around the world.  Yay !

  28. Dave says:

    @Plompan – I (as a CS dropout) chose to leave my university’s CS program due to the poor quality of education I was receiving there. Having taught myself a lot prior to college I only felt they were undoing that which I had learned prior. I haven’t regretted that choice once. For me it was simply an example of "Those who can, do. Those who can’t, teach." Yes, I know that’s a blanket statement and not all programs or teachers are bad, but I ended up being better off pocketing the money I would have paid out to the college.

  29. steveg says:

    I like the case where you might allocate MAX_[datatype] when size wraps and all of a sudden IsBadReadPtr is accessing a random address.

    To be fair to the original programmer: when was that code written? When did IsBadReadPtr’s limitations become well-known and documented? If that was written forever ago, well, you know, all of us old-timers have used IsBadReadPtr. Perhaps they didn’t have the latest MSDN library disc.

    The Gen-Y view: "It’s not optimal code, but meh, these days 16384 anything doesn’t take that long. Do I really have to test my code?"

  30. steveg says:

    @Tony Cox: you work for a very large organisation with full-time testers and managers, your own office and benefits out of this world, so you get the luxury of doing things properly.

    I’m a pretty good programmer, not the best, but you know, stuff I write mostly works. My contract at the moment is "the geek" at a 120 person company with a 5 person IT team (that’s everyone, from helpdesk to network to the boss). One moment I’m in meetings with the CEO + COO gathering requirements, the next I’m debugging a stored procedure, or writing a utility, or doing data analysis, or coding an update for the website.

    How much time do I have to think things through? We’re getting another developer in for two weeks because I’m overloaded. How much time will I have to test let alone review his work? Almost none.

    Is that bad management? Probably. Where’s the money going to come from, though, to do it properly?

    Reality and theory aren’t always the best bedfellows.

    Raymond’s snippet may or may not work. It’s provably a timebomb, but you know what, it may not actually matter.

    (BTW I totally agree with @Tony Cox, I’m providing a counterpoint).

  31. blah says:

    You guys all missed the proper way to do it with style!

    size = sizeof(buffer);

  32. Kyle says:

    <i>Raymond, please, explain what’s wrong with that code.</i>

    Are you serious?

    This is where it’s helpful to actually understand the technologies you’re using, like memory allocation.  There are at least the following problems with the function:

    1) What if the buffer is larger than 16k?

    2) This wastes precious CPU time.

    3) It probably won’t be accurate anyway.

    Now, only #3 is really worth responding to.  #3 is true because memory access violations are only handled at the page level (4KB on up).  So if you allocate something like 5KB, at least two pages are required in the 4KB scenario.  Not only that, but if the heap manager places this buffer at the beginning of this two-page region, this function call will succeed for 8KB, even though you only wanted 5KB.

    While this doesn’t necessarily prove that the IsBad***Ptr() functions are crap all around (the link that Raymond provided does that), this <i>does</i> necessarily prove that in this situation they’re worthless.

  33. tm says:

    This is a terrifying example that shows exactly why universities do the entire programming profession a disservice by providing software engineering degrees.  Most programmers would be aided immensely by studying computer science instead of software engineering.  Spolsky is an idiot.  The guy who wrote this code was probably a middle-to-upper class software engineering graduate.  Knows project management, when to write requirement docs vs. agile methodology plans, understands unit testing at a basic level…but clearly can’t write quality code in a real-life environment to save his or her life.  

  34. Gaspar says:

    @Nik

    I just check and you are right…  And that terrifies me…  X(

  35. Mike Dunn says:

    *facepalm*

  36. Mike Dimmick says:

    GWO: if it is a heap-allocated buffer allocated with malloc() (and friends), you can use _msize in the Visual C++ runtime to find out how large the block is. If you want to extend a buffer *without* copying it if there isn’t space, use _expand. realloc() will allocate a new block if it can’t extend this one, so you have to be prepared to get a new pointer value back.

    For code that allocated with LocalAlloc, use LocalSize. For HeapAlloc, use HeapSize. For GlobalAlloc, use GlobalSize. (For the equivalent of _expand, see HeapReAlloc with the HEAP_REALLOC_IN_PLACE_ONLY flag.)

    Finally, for CoTaskMemAlloc, use CoGetMalloc to get an IMalloc pointer and call IMalloc::GetSize. There doesn’t seem to be an equivalent for BSTR (SysAllocString) despite having a SysReAllocString.

    [For BSTR, use SysStringLen. Of course, knowing the allocation size doesn’t tell you the size of the buffer. -Raymond]
  37. Olivier says:

    I’m starting my own company and I was searching for a way to determine how to determine the size of a buffer. The solution is right there, I just had to copy/paste.

    So thanks to the company "out of business" and Raymond, you saved my company!

  38. BC says:

    MSDN: “This function is typically used when working with pointers returned from third-party libraries, where you cannot determine the memory management behavior in the third-party DLL.”

    Given that this code is from “a back-end program on a web server”, then probably this module needs to rewrite the GET string, and all it gets from the previous module in the pipeline is the location of the string buffer, not the size of the buffer.  

    It is entirely possible that, originally, the code did the “right thing”: allocate a new buffer large enough to hold the original string plus whatever modifications this module needed to make, but, after being put in production, the server started bogging down because every module in the pipeline was doing the “right thing”, and the system was spending more time swapping than processing.

    Thus resulted this code, which determines if a supplied buffer is large enough to be reused.

    [If they were trying to modify a string in-place, wouldn’t they have used IsBadWritePtr? Or maybe that’s the secret joke: Not only did they do it the wrong way, the used the wrong function to do it! -Raymond]
  39. Keith says:

    *twitch*

  40. Paul Betts says:

    I think above every bad code sample you post, you need to have comments like:

    // XXX: This is a horrible idea, please don’t do this.

    // XXX: Seriously. A really bad idea.

    // XXX: Once again, I can’t stress enough, the badness of this idea.

    size = 16384;

    while (size && IsBadReadPtr(buffer, size)) {

    /* … */

  41. Nick says:

    @Nik

    What’s even better is that this blog entry is the #2 Google result for "best way to determine the size of a buffer"

    http://www.google.com/search?q=best+way+to+determine+the+size+of+a+buffer

    :)

  42. an_idiot_is_an_idiot_no_matter_what_degrees says:

    Wednesday, January 20, 2010 9:53 AM by Plompan

    > This is a terrifying example that shows exactly why universities do the entire software engineering profession a disservice by providing CS degrees.  Most programmers would be aided immensely by studying software engineering instead of computer science.  I’m with Spolsky on this one.  The guy who wrote this code was probably a low-to-middle of the class CS graduate.  Knows big-O, when to use a list vs. a hash vs. an array, understands recursion at a basic level…but clearly can’t write quality code in a real-life environment.  

    ===================

    Please don’t confuse a CS graduate with a CS dropout.  Raymond’s example is far more likely the work of the latter.

    (And FYI, there *are* software engineering courses at my school’s CS department.)

  43. Crescens2k says:

    Well, I can say this for a fact. As a CS graduate I have never ever thought about using code like this.

    If I need to know the size of a buffer then I pass the size along with the pointer. I’m not a lazy programmer so I always put the extra effort into making sure that the code is usable, readable and as safe as I possibly can make it.

    For figuring out the size of a buffer using HeapSize etc, with the way the windows Heap functions work at least is that only the address returned which is at the start of the block is the valid value for any of the Heap functions.

    What this means is that if you try the following code.

    void* mymem;

    void* mymem2;

    int memsize = 21, retsize;

    mymem = HeapAlloc(GetProcessHeap(), 0, memsize);

    mymem2 = (void*)((char*)mymem+1);

    retsize = HeapSize(GetProcessHeap(), 0, mymem2);

    HeapFree(GetProcessHeap(), 0, mymem);

    retsize will be -1 after sending a debug breakpoint to a debugger. So really using these functions also only work if they have the same memory address as the start of the block, but it will then return the size of the allocated block not the possible buffer size.

    Really, the only sensible way to tell any code the size of a buffer is to give the size along with the buffer. Of course there is bound to be someone out there who could probably mess even that up.

  44. porter says:

    I thought all CS departments used Java and C# these days…..

  45. Cheong says:

    @steveg: I’d say… Do it. Then "discard the change and remake it when you have time"… which usually means never. :P

  46. xiaoguo ge says:

    One of the major concern with the IsBad**Ptr function is that it eats the guard page exception and clears the PAGE_GUARD flag of the tested memory range. But what if this function resets the PAGE_GUARD property of the tested range upon catching the guard page exception? Is it still be a bad idear to use IsBad**Ptr functions?

  47. Mihai says:

    @steveg

    I would ask: "where’s the money going to come from, though, to do it wrong?"

    If you don’t have money for code reviews, good QE, etc., then you *MUST* take the time to think and do it right.

    Does the company have the money to afford 2 days with the software not working? Or a lawsuit? Or a leak of private info that will drive all customers away?

  48. Médinoc says:

    This is terrible. They should have used IsBadWritePtr() instead! ;-)

  49. Cheong says:

    Rather than arguing whether CS or SE subject is doing a disservice, I’d say that it’s the "people" who do disservice to the profession.

    Perheps universities should consider NEVER allow professors/teaching staffs *who never did get a job and produce any production code for long enough period before* teaching programming courses.

  50. fsdg says:

    For all the people suggesting the use of _msize and friends, do note that it’s very possible for memory allocations to be rounded up (i.e. malloc(1) returning a 12-byte sized buffer) so it’s almost certainly a bad idea to do it.

  51. steveg says:

    @Mihai: If you don’t have money for code reviews, good QE, etc., then you *MUST* take the time to think and do it right.

    I hear you, buddy, but reality is a harsh mistress. This is the first time i’ve worked in a business where I’m not part of a development team delivering a project (the great consulting adventure).

    Here’s a story. Imagine a fancy new feature on the website that Marketing has been planning for months. And they only tell IT (eg me) a few weeks before they want it to go live.

    You say "No."

    They say "But we’ve already spent $250,000 on advertising and printing etc"

    You say "No, I need 2 more days to review the code to reduce the risk of mess up."

    They say "We accept that risk. Do it."

    You say ?

  52. Tony Cox [MSFT] says:

    @steveg

    Your example is I’m sure familiar to many people. Ultimately, business decision makers need to take business decisions, weighing all sorts of competing factors. Is it worse to ship with bug 891, or slip by a month and burn a million dollar marketing campaign? Who knows? It depends on what ‘bug 891’ is. Is it "ugly comma splice in third paragraph of readme", or "electrocutes kittens"?

    Good software engineering process isn’t about guaranteeing perfect software (no software is ever perfect). It’s about maximizing quality and predictability of development. And, crucially, it’s about making sure business leaders have information to make good decisions.

    If you have sloppy process, you might not even know that bug 891 exists. If you have good process, you hopefully know that it exists (or know that there’s a risk it exists), and you can make a business decision accordingly.

  53. Anonymous Coward says:

    Thanks Raymond, this is hilarious. Way better than what I’ve seen on the DailyWTF recently.

    By the way, this article is #1 if you Google for ‘accurate way to determine the size of a buffer’.

  54. Worf says:

    Actually, I think both CS and SE graduates would benefit from two courses – assembly language and operating systems. The first answers the "how the machine works" question, and everyone should be able to hand-compile a simple program, and learn what those magic keywords actually do.

    The second teaches that the OS is not a diety, and things work in certain ways, and why that malloc() may not do what you expect. Or what page faulting and exception handling actually do internally.

    I can’t tell where that code was written – be it a trade school, or degree program. It is, however, absolutely horrible and belongs on sites like thedailywtf.com. Hell, that can be rentacoder code.

  55. Anonymous says:

    @Tony Cox [MSFT]

    Junior developers do dumb things. That doesn’t necessarily make them stupid or bad people, it just makes them junior. I’ve seen all sorts of similarly misguided code from junior developers who simply didn’t know any better. The key is whether you have a process in place that (a) prevents stuff like this from shipping, and (b) trains developers so that they don’t make this kind of mistake in future.

    It’s also an even worse problem for SDETs/QA developers because there isn’t any mentoring or other corrective forces at work that exist with developers.  I’ve seen many, many atrocities committed in QA code.  Senior SDETs are people who were successful at finding bugs, not writing good code.

  56. Nick says:

    This sounds like the basis for a good game.  After Raymond posts a blog entry, the first person to find a generic, non-quoted Google query that returns the new post as the first result wins!

  57. BC says:

    If they were trying to modify a string in-place, wouldn’t they have used IsBadWritePtr? Or maybe that’s the secret joke: Not only did they do it the wrong way, the used the wrong function to do it!

    Probably the architecture required that all buffers coming out of any module be writable, so using IsReadPtr is sufficient – maybe it was thought to be faster.  

    Probably it was a security thing… adding a long series of random digits to the GET string, and rather than redefine the module interface to include the buffer-size, it was easier to rewrite the module to in-place overwrite the buffer – once it was realized that making a copy of each buffer bogged down the server.

  58. Joseph Koss says:

    Could it be that the programmer was simply making the best of a bad situation?

    If its not their pointer, and buffer overflow had become a mission-critical downtime issue at some point, then this might have been a quick fix to get back to full uptime so that they could go back to working on a full replacement for the entire service?

  59. Danny says:

    Graduate and dropout are words. And only that!!. Bill Gates is a dropout from Harvard, do I need to say more?

    So don’t generalize from that, because both dropouts and graduates can be bad or good programmers.

  60. Gabe says:

    Joseph Koss: If a programmer was smart, but in a hurry and trying to quickly hack around some buffer overflow causing a fault, they would have started from 0 (as I facetiously suggested in my first comment).

    Starting from the end is not only slower (causing potentially thousands of exceptions), but is also liable to give incorrect answers. If buffer[16384] is readable, the bad algorithm will return 16384 as the size, even if the buffer is only 1 byte followed by an unallocated page. If their algorithm started from 0, they would quickly identify the unallocated page and see that the buffer can’t be more than 1 byte long.

  61. Nele says:

    If you think this company is bad, what about one that wrote IsBadReadPtr in the first place?

    [No matter what you do, someone will call you an idiot. First, Windows was idiotic for not having a way to validate pointers, now it’s idiotic for having a way. (Note that the function was not idiotic when it was first written back in the days of co-operative multi-tasking.) Mind you, having a non-idiotic function doesn’t prevent people from using it in idiotic ways like this. -Raymond]

Comments are closed.