One way to make sure you pass an array of the correct size


Another entry in the very sporadic series of "very strange code I've seen." The code has been changed to protect the guilty, but the essence has been preserved.

class Store
{
public:
    // Retrieve "count" colors from item "itemId" into "values"
    bool GetItemColors(int itemId, int count, COLORREF *values);

    // Set "count" colors from "values" into item "itemId"
    bool SetItemColors(int itemId, int count, const COLORREF *values);
};

bool CopyUpToFourColors(Store *store1, Store *store2, int itemId, int count)
{
    COLORREF size1[1];
    COLORREF size2[2];
    COLORREF size3[3];
    COLORREF size4[4];

    int *buffer = ((count == 1) ? size1 :
                  ((count == 2) ? size2 :
                  ((count == 3) ? size3 :
                  ((count == 4) ? size4 :
                                  nullptr))));

    if (buffer == nullptr)
        return false;

    if (!store1->GetItemColors(itemId, count, buffer))
        return false;

    if (!store2->SetItemColors(itemId, count, buffer))
        return false;

    return true;
}
Comments (61)
  1. Anonymous says:

    For a minute there, I thought I had clicked "The Daily WTF" instead of "The Old New Thing".

  2. Anonymous says:

    If the compiler automatically inserted guards at the starts and ends of each array, it could be a way of ensuring that Get/SetItemColors didn't write out of bounds.  Of course doing it that way is still a WTF.

  3. Anonymous says:

    Woohoo. And C++ has templates to the stack rescue if requirements decide "Four" != (or maybe =!) 4. (Other languages probably have similar solutions).

  4. Anonymous says:

    You can do stack based variable length arrays in C99.  Not sure if it's ever been added to C++.

  5. Anonymous says:

    Could have saved some stack space with a union.

  6. So what. Here is a function from Windows SDK samples, with a couple of WTFs:

    LPCWSTR ConvertGuidToString(REFGUID guid)

    {

       static WCHAR wszGuid[MAX_GUID_STRING_LENGTH];

       if (!StringFromGUID2(guid, wszGuid, MAX_GUID_STRING_LENGTH))

       {

           wszGuid[0] = L'';

       }

       return reinterpret_cast<WCHAR *>(wszGuid);

    }

  7. @Joshua: my compiler thought about it and refused to compile your example.  So no, it doesn't work in any way, because I can't compile it.

    Unless you overloaded some operators somewhere else?

  8. Anonymous says:

    Whenever I see code like this, I suspect the original author had trouble getting it right and hacked away randomly until it worked without really understanding what the problem was.

  9. Anonymous says:

    @Joshua: you are using C-style casts in C++, which is usually considered a bad idea. In your example, I would guess that the cast is a reinterpret_cast, which is allowed to do weird things. Using C++-style casts makes these things more obvious.

  10. Anonymous says:

    I am astonished by the number of bad ideas in this thread.

  11. Anonymous says:

    Come on, people, Wat is plain right. This is, supposedly, a blog read by professional programmers. Maurits nailed it: simple, elegant, and to the point. Why complicate it more? Unless you are trying to out-WTF the original author, of course.

  12. Anonymous says:

    @Antonio Rodríguez: I guess you never read the comment threads at TheDailyWTF then. Out-WTFing the original post by comically missing the point is part of the fun.

    Stephan L.'s idea is the best so far, but I think I can enhance it a bit more:

    COLORREF size4[4];

    int *buffer = count >= 1 && count <= 4 ? &size4[4 – count] : nullptr;

  13. @Maurits: Your example assumes that the compiler will always evaluate the terms of the "logical and" in a left to right order. Clearly the 4th term relies on the third term, but what prevents some optimization mode of some compiler to "erroneously" execute the terms "out of order"?? (And by erroneous, I mean "in a manner other than what you intended")

    That's a leap of faith I wouldn't make. I would hazard a guess (if the compiler generated what you wanted) that your example and my example look very much alike after compilation, and the compiler can't misinterpret mine.

  14. Anonymous says:

    I think the main issue is readability. I wouldn't worry too much about stack optimization because there is nothing to prevent an optimizing compiler from storing size1, …, size4 at the same address and generating Maurits's version. (Mine doesn't, though.)

  15. Anonymous says:

    @Cesar: Of course assigning to a const char* after that cast is undefined behavior, but the cast is still ok as long as you don't modify it.  It's sometimes necessary for dealing with certain APIs that take non-const parameters but don't modify them, such as the argv parameter in the exec family of functions in POSIX.

  16. Anonymous says:

    @Brian_EE

    If you can't count on short-circuited operators to work in order, then why even have them?

  17. Anonymous says:

    @kbiel: Makes you wonder what the developers of Visual Basic were thinking: support.microsoft.com/…/817250 .  You have to use AndAlso/OrElse to get the behavior of what And/Or should be.

  18. @kbiel: I have no problem admitting that we come from different schools of thought. For me, If I have to stop and think about whether or not the compiler will give me what I want, is that any better than my example (which accomplishes the same thing) which is explicit in the behaviour I expect? For me, I could type the few extra characters faster than I could double check my logic (ala Maurits example). It all boils down to personal preference IMO, but I hate to debug code for subtle mistakes. I'll make a readability argument too, but again that's part personal style/preference.

  19. Anonymous says:

    @Brian_EE: You don't have to think about what short circuit evaluation does.. except if you start to think in terms of the standard and then, it turns out that the following holds:

       1.9.18

       In the evaluation of the following expressions

       a && b

       a || b

       a ? b : c

       a , b

       using the built-in meaning of the operators in these expressions (Note: This is important though!), there is a sequence point after the evaluation of the first expression (12).

    So short circuit evaluation works just as every programmer would expect.. well as long as you don't start to override && for custom types (but then that way lies madness). If anyone started to write code like:

    if (foo) {

    ___if (foo->bar) {

    ___}

    }

    instead of if (foo && foo->bar) C code would be unnecessarily bloated.

  20. Anonymous says:

    @Adam Rosenfield: "And" is the bitwise variant. "AndAlso" is the logical variant. Set Option Strict On and learn.

  21. Anonymous says:

    @Joshua: IANA VB programmer (and have no desire to become one), so ok.  But it doesn't help that And is used for both bitwise & logical operations (and that KB article I linked to calls it a logical operator everywhere).

  22. Brian_EE says:

    bool CopyUpToFourColors(Store *store1, Store *store2, int itemId, int count)

    {

       COLORREF buffer[4];

       if ((count > 4) || (count < 1))

           return false;

       if (!store1->GetItemColors(itemId, count, buffer))

           return false;

       if (!store2->SetItemColors(itemId, count, buffer))

           return false;

       return true;

    }

  23. Anonymous says:

    @Adam Rosenfield: Ah yes a conversion article from VB6. VB6 and older Basic all the way back to the dawn of time got away with "And" (originally AND) for both with no short-circuit by judicious choice of -1 for TRUE.

  24. Dan Bugglin says:

    bool CopyAsManyColorsAsYouWant(Store *store1, Store *store2, int itemId, int count) {

      COLORREF * buffer = new COLORREF[count];

      if (count < 1) {

          delete[] buffer;

          return false;

      }

      if (!store1->GetItemColors(itemId, count, buffer)) {

          delete[] buffer;

          return false;

      }

      if (!store2->SetItemColors(itemId, count, buffer)) {

          delete[] buffer;

          return false;

      }

      delete[] buffer;

      return true;

    }

    Forgive me if I did something wrong it's been forever since I've done any C++… IIRC you can't dynamically declare an array upper bound with a parameter though.

    [This adds a heap allocation (and new failure mode) to every call to what happens to be a high-frequency function. In practice, the original function never fails because the Stores in question always have the correct number of colors, so you basically changed a function that never fails to a function that very rarely fails. This tends to create subtle bugs. -Raymond]
  25. Anonymous says:

    IMHO There is a way to save space on the stack while keeping up with the general principle :

    bool CopyUpToFourColors(Store *store1, Store *store2, int itemId, int count)

    {

       COLORREF size4[4];

       COLORREF *size3 = &size4[1];

       COLORREF *size2 = &size3[1];

       COLORREF *size1 = &size2[1];

      …

  26. @Brian_EE:  "Your example assumes that the compiler will always evaluate the terms of the "logical and" in a left to right order. Clearly the 4th term relies on the third term, but what prevents some optimization mode of some compiler to "erroneously" execute the terms "out of order"??

    That's a leap of faith I wouldn't make."

    Trusting your compiler to follow the C/C++ standards is a leap of faith?  If you can't trust it to do that, then you may as well give up and go home.  Or write in assembly, assuming you still trust the CPU.

    The language is very clear that anything on the right side of short-circuit logic won't execute until everything on the left side has been evaluated and a need still exists to evaluate the right side.  I've been taking advantage of short-circuit logic operators for years in C/C++/C#/VB .NET/Java and in C/C++ compilers from multiple vendors.  If I misunderstood short-circuit logic, I think I would know by now.

    And it's very readable – I don't have to struggle at all to understand what a short-circuit logical operator does.  Other developers here use short-circuit operators regularly too; nobody has any problems with them.  In fact, I find that compact code in a smaller space is much more readable than code that was longer and more convoluted due to a lack of knowledge or trust about the language.  Something about there being less to try to read and mentally parse.

  27. Anonymous says:

    @Brian_EE: "Your example assumes that the compiler will always evaluate the terms of the "logical and" in a left to right order. Clearly the 4th term relies on the third term, but what prevents some optimization mode of some compiler to "erroneously" execute the terms "out of order"?? (And by erroneous, I mean "in a manner other than what you intended")"

    The C and C++ standards guarantee this. I am actually scared of C/C++ programmers who don't know this. Similarly programmers who use '<' in for() loops with the justification that it is safer than '!='. It's actually the other way around.

  28. I'm guessing Raymond expected to see a single buffer of size 4, together with the idiom of telling the called function that the buffer you're given it is smaller than it really is:

    bool CopyUpToFourColors(Store *store1, Store *store2, int itemId, int count)

    {

    ___ COLORREF temp[4];

    ___ return

    _______ 0 < count && count <= ARRAYSIZE(temp) &&

    _______ store1->GetItemColors(itemid, count, temp) &&

    _______ store2->SetItemColors(itemId, count, temp);

    }

  29. @MAZZTer:  I'd suggest using an STL vector instead of using C-style arrays.  Less details to get wrong and shorter code.  For example, your code is buggy and will leak memory if someone throws an exception, like say Get/SetItemColors.

    bool CopyAsManyColorsAsYouWant(Store *store1, Store *store2, int itemId, int count) {

     vector<COLORREF> buffer;

     buffer.resize(min(1, count));

     return store1->GetItemColors(itemId, count, &buffer[0]) && store2->SetItemColors(itemId, count, &buffer[0]);

    }

    Of course, if it's a hotspot like Raymond says, the heap allocations will really hurt performance.  You could write a version that uses only the stack for small counts, and resorts to using the heap for larger counts.  If the Get/SetItemColors could be modified to allow you to get/set a range of colors, e.g. given an offset and a count, then you could do it 100% on the stack with no heap: just copy several colors at a time.

    But really – stack or heap implementation, you can reduce the size to only one or two lines of code.  You'd think the original programmer was being paid by the line of code…

  30. Anonymous says:

    As for strange code, how about

      vector<char *> x;

      //…

      vector<const char *>y = (vector<const char *>*)x;

    It actually works. Think about it.

  31. I think EduardoS is suggesting that the copy to a temporary table could be eliminated by adding another method:

    class Store {

    ____ void CopyFrom(Store *source) { for (int i = 0; i < m_count; i++) { m_color[i] = source->m_color[i]; }

    };

    [True, that's a consequence of my rewrite. The original code was copying data between two unrelated classes. -Raymond]
  32. Anonymous says:

    Aaaaaaaa typo. Should be:

    vector<const char *>*y = (vector<const char *>*)&x;

  33. @Cesar:  Exactly right.  A normal C++ cast like static_cast wouldn't let someone do the cast in question: you'd get a compile error.  reinterpret_cast is a way of telling the compiler "hey, I know this looks fishy, but I really, really, really know what I'm doing."  (const_cast is another one that tells the compiler you know what you're doing, and that you shouldn't normally have to use.)  Examples might be integers that are being abused to store pointers.  Or in this case, where the pointer says it's pointing to one data type, but you know it really points to a completely different, incompatible type.  I think it's best to avoid these situations.  In this case, reinterpret_cast<vector<const char *>*>(&x) would be a way of saying "I know the pointer says it points to a vector<char*>, but the pointer type is erroneous and it actually points to a vector<const char *>."  In this case, you'd be sorely mistaken.

  34. Anonymous says:

    It turns out that code generated for vector<const char *> and vector<char *> are sufficiently close that they can operate on each other's objects (on any environment that Windows compiles for). Of course, if the cast was the other way, you just casted a bunch of const char * to char *, with the usual consequences.

  35. Anonymous says:

    And before anyone thinks it is okay because "it is just adding a const": the vector<T> template could be using evil template magic to detect the presence or absence of the const (<T*> versus <const T*>) and have a completely different internal layout and implementation in each case.

    Even if no templates were involved the cast would still be evil:

    char *x[42];

    const char **y = (const char **)x;

    y[0] = "Oops"; // Oops, you just assigned a .rodata const char * to a char *, losing the const.

    x[0][0] = ''; // Not be caught by the compiler but will crash.

  36. Anonymous says:

    @Brian_EE: IIRC, && and || sequence operations, like the comma operator, so the compiler cannot misoptimize that way.

    That is, you can do things like

    if (foo && foo->bar) /* … */

    And not crash due to a null pointer.

  37. Anonymous says:

    @Kzinti: I've always used '<' in for loops because it makes it a little bit more clear what's going on, and it's also the idiom. Being afraid of '!=' is a different story, though.

  38. Anonymous says:

    @J: My pet peeve is specifically about people that say use '<' because it is 'safer'. As for it being an idiom, '!=' is also an idiom in C++ when dealing with sequences that might not be comparable with '<'. This is all over the STL for example. I've also encountered countless cases of '<' loops being broken because of signed / unsigned integer mismatches (which isn't a problem with '!='). I've heard the argument that if you use '!=', it's possible that you will skip over the end of your sequence… (hence the safer part about '<' I suppose). This is precisely why I think '!=' is better: it will break your code and not hide a bug.

  39. Anonymous says:

    @Raymond in response to "The MAZZTer" post, if SetItemColors accept an unlimited number of colors doesn't it also require a heap allocation?

    Also, why not just change the "Store" class to allow a direct copy of the items?

    [You do realize this was a totally artificial example. (And GetItemColors *is* a direct copy of the item's color table.) -Raymond]
  40. Anonymous says:

    @J and @Kzinti: if my code doesn't compile with -Wall -Werror when I write it, then I fix it.  The probability that a compiler warning is a bug (like, say, "comparison of signed and unsigned data types" or "comparison is always true due to limited range of data type") is rather high.

  41. Anonymous says:

    @JamesJohnston

    In fact, I find that compact code in a smaller space is much more readable than code that was longer and more convoluted due to a lack of knowledge or trust about the language.  

    Hallelujah!

    (I mean it totally non-sarcastically)

  42. Anonymous says:

    Yet COM collections use long as the param for get_Count, instead of unsigned long. (Probably due to VB, and not the existence of a negative number of items)

  43. I prefer Brian_EE's code sample to my own; it uses the more established convention of "if (!operation) { bail; }" and is thus easier to follow.

  44. Brian_EE says:

    @JamesJohnston, @Kzinti: You made a leap of faith in assuming that everyone's native language is C/C++ and thus is intimately familiar with *every rule* of the language. I only use Windows for "testware"…

  45. Anonymous says:

    @Maurits Personally I'd combine the two approaches: First check in a single if whether the count is valid and then return store1->GetItemColors(itemid, count, temp) && store2->SetItemColors(itemId, count, temp);

  46. @Brian_EE:  Short-circuit logical operators exist in a lot more than C/C++ like I mentioned: C#, Java, JavaScript, PHP, Perl, Python, VB .NET.  List at en.wikipedia.org/…/Short-circuit_evaluation – What do you use – are you primarily only familiar with VB6 which did not have short-circuit operators?  My point is this isn't some esoteric feature of C++, but rather a really basic thing found in just about every common language in use.  I'd be pretty worried if I had to work with a developer who doesn't know what they are.

    Reading the "return" statement is easy: I think: "if operation A succeeds *AND* operation B succeeds, then return true."

  47. Anonymous says:

    I can see the case where someone moving between C-based languages and Pascal-based languages would want to be explicit with conditional evaluations… and someone coming from a Pascal-based language probably isn't too concerned about terseness.

  48. Anonymous says:

    Delphi/Object Pascal (known as Turbo Pascal in the DOS ages, but brought to good OO capabilities since then) supports shortcut evaluation for aeons. It is very handy there for the same reasons as in all other languages.

  49. Mordachai says:

    If you're not sure about the safety of short-circuit evaluations for a given language…. look it up!  As several have mentioned, it's a foundational bit of language design that is present in many programming languages.

    You certainly don't want to rely upon it in some bizarre language that doesn't support it.  But for the rest, you don't want to pollute your code with convoluted work-arounds to an imaginary worry.

  50. Anonymous says:

    "I prefer Brian_EE's code sample to my own; it uses the more established convention of "if (!operation) { bail; }" and is thus easier to follow."

    It also uses the anti-pattern of multiple exit points to the function. As you can see from his example, he has to remember what he needs to clean up for each bail path, making the function much less stable.

  51. Mordachai says:

    @Maurits Multiple exit points are NOT an anti-pattern.  Not in C++.  Software I've seen written in such convoluted ways to avoid exiting is a far messier anti-pattern than exiting when exiting is appropriate.

    If you're written RAII code, then exiting a function *anywhere* – including by exception which has the ugly reality of often happening where you least expected it, is still perfectly valid and safe.

  52. Anonymous says:

    Joke… not meant to be taken as serious statements on software engineering principles:

    How many programmers does it take to screw in a light bulb?

    Ten. One to screw it in and nine to stand around and talk about how much better they could do it :)

  53. Anonymous says:

    It also uses the anti-pattern of multiple exit points to the function.

    I thought single entry single exit only applied to COBOL, which let you do some really crazy (by today's standards) things regarding function entries and exits…

  54. Anonymous says:

    Yes, but TheMazzter's code shows how not using RAII becomes problematic if you have multiple exit points, since he's having to delete[] the array he allocated at every exit point.

    This is a special case violation of the "Write Code Once and Only Once" pattern, since he has four copies of cleanup code, specifically because he has four exit points to the function. With a single exit point, he would need only one cleanup routine in the function.

    RAII fixes this by just letting the compiler hide all of that dirtiness, but code is easier to refactor and usually easier to understand if it has a single exit point.

  55. Anonymous says:

    Stack allocated memory (auto) is deallocated on return, no matter when they go out of scope. If you want fine grained memory control raii is not an option. Same with GC, you have no control when memory is deallocated.

  56. Anonymous says:

    Heap allocated memory (such as in TheMazzter) is not RAII, and hence his decision to use multiple exit points is an anti-pattern.

    "If you want fine grained memory control raii is not an option"

    Sure it is. Just use std::vector<COLORREF> instead of new COLORREF[];

  57. Brian_EE says:

    @JamesJohnston: >"My point is this isn't some esoteric feature of C++, but rather a really basic thing found in just about every common language in use."

    Your assumption of what consitutes a "common language" is probably reasonable, but in this case, wrong. I invite you to come check out the languages used for RTL design of ASICs. Statements written sequentially are represented physically by parallel hardware elements. There is no concept of "short circuit" (in fact you would *never* want a short circuit in the true sense). So, if that is one's "native" language…

    >I'd be pretty worried if I had to work with a developer who doesn't know what they are.

    I refer you to my comment about "testware".

  58. Anonymous says:

    @Maurits alias @Steve Wolf: Once you start using try-finally (or "using" in .NET), multiple exit points are a very good pattern. Since languages which support try-finally also have the concept of exceptions, using such a language already makes it a requirement to use this construct to cleanup resource reliable.

    No matter if your resources are heap-allocated memory, network connections, GDI handles, database cursors, handles for system objects or whatever: Use try-finally. And because you cannot control the time when objects are picked up for finalization by the Garbage Collector, use it even in .NET or Java to get your resources back in a deterministic manner.

  59. @Someone:  Since I've been using RAII in C++ lately, it makes C# feel gross once I've nested two or three or four "using" blocks.  There's too much room for leaking resources in .NET, whereas well-written C++ code won't have this problem.  (Of course, this ignores the many opportunities to accidentally exhibit undefined behavior in C++.)  In C++, there is no "finally", because you don't need it if you're using RAII.  Every time I use C# now, I cringe every time I have to type "using", or properly implement IDisposable and properly dispose owned components there.

  60. "Your assumption of what consitutes a "common language" is probably reasonable, but in this case, wrong. I invite you to come check out the languages used for RTL design of ASICs. Statements written sequentially are represented physically by parallel hardware elements. There is no concept of "short circuit" (in fact you would *never* want a short circuit in the true sense)."

    I actually did some VHDL projects a few years ago when I was in school.  Sadly, I haven't been able to pursue it further, although I would like to someday.  (I have more experience with PCB design / microcontroller programming.)  I think it's fair to say that software programming languages (which describe sequential steps for a processor to take) are completely different from hardware description languages (which simply describe physical parts – no different than any other CAD model).  I would also say that some common/best practices and habits and terminology that are essential / commonly used in software programming are NOT appropriate in a HDL, and vice versa.  For example, "short circuit" has a completely different negative meaning in a HDL than in software programming, and it's silly to pretend they are the same and should have the same connotations.  (Perhaps I should have clarified and said "common software programming language" instead of "common language").

  61. Anonymous says:

    @JamesJohnston: My point is that you already need to use the constructs available in the language at hand to release rssources, to be safe against exceptions which may be thrown by any "foreign" method you are using, including framework methods.

    With that in place, early returns are a very good design tool. Much much better than complicated constructs to let your 2 IFs inside a loop reach the end of the function through excessive testing of error flags at each and every place after the line where you have detected that you need to break execution.

Comments are closed.