What’s Wrong with this Code?


A while back, Erik Schwiebert wrote about some of the travails of moving our build system from one using CodeWarrior to one using XCode and GCC. Erik mentioned the huge number of errors and warnings we’ve had to resolve, but, in reading some of the reactions around the web, I’m afraid that some people, particularly people who know nothing about compilers and developing software, seem to have completely mistaken Erik’s remarks.


Someday, I’ll understand why people are motivated to render opinions on matters of which they are entirely ignorant, but, for now, I’ll have to satisfy myself by offering some remedial education. I’ll start by asking, what’s wrong with the following code:



class FooBar
{
public:
FooBar(int val): _val(val) {};
~FooBar() {};

void Foo(int val)
{
_val += Bar(val);
};

int Val() { return _val; };

private:
int Bar(int val)
{
_val = val;
return _val;
};

int _val;
};

int main()
{
FooBar fb(42);
fb.Foo(1);

int val = fb.Val();

return 0;
}



Some of my readers will spot the problem immediately. To those who do, I’ll remind you that this is a contrived example. A real-world example will have a class that has 10 or more methods, and none of those methods will be as trivial as Foo or Bar.


Update: Fixed code problems due to .Text’s handling of HTML entities.


 


Rick


Currently playing in iTunes: One Way Out by The Allman Brothers

Comments (21)

  1. Doug says:

    The result of the += is undefined, as there’s no guarantee of the order of subexpressions, ond one of those affects the other?

  2. Atul says:

    Am I missing something here? Val() will return 2 because Bar() just assigns val to _val.

    Perhaps you intended to call Bar() from the ctor() or something along those lines?

  3. Well, a couple of problems, really.  First, _val may not be defined initially but still referenced through the Val method.  i.e. calling FooBar fb(); and then fb.val(); would be problematic.

    Second, Bar is changing the value of _val at the same point as the += from what I remember of the compiler.  i.e. in Bar(int val) you have the _val = val;  SO, at what state is _val when Bar returns?  Is it 42 or is it 1?  I’m explaining this kinda badly, but I think this is an issue.

    Am I right here, or am I missing this entirely?

  4. Joshua Ochs says:

    I’m not going to give it away entirely, but I think the problem is +=, specifically Stroustrup 6.2.2 – Evaluation Order. Let me guess – CodeWarrior gives 43, and XCode gives 2, or vice-versa. Ugly.

  5. Joshua Ochs says:

    Sorry, take my Apple dev hat away. That comment should have ended:

    "Nasty."

    "Yeah."

    That could be considered an additional litmus test for whether people understand Mac development, specifically. 😉

  6. Rick Schaut says:

    Jason,

    There is only one ctor, so you can’t have _val uninitialized.

    Joshua,

    You had a 50% chance of getting it right.  Bummer :-).

  7. Schwieb says:

    Heh.  I added a printf of the result of val, and then compiled this as "g++ -Wall -o foo foo.cpp", and got – gasp – ZERO warnings. Not that CW is any better.  It would have to be a darn good compiler to detect the wrapped reference to _val and realize that it expanded to an ambiguous expression.

    And yes, gcc does the load and then the fn call, whereas CW does the fn call and then the load.  Totally implementation dependent and behaviorally undefined.  :)

  8. Rick Schaut says:

    Darn it, Schwieb, you’re stealing the thunder of my next post! :-).

    Oh, and minus points for using printf.  Next time, include iostream, and use:

    std::out << fb.Val() << std::endl;

  9. Frederik Slijkerman says:

    This is totally offtopic, but anyway: The semicolon is not part of a method implementation, only of a declaration. So:

    int Foo(int x);

    int Foo(int x) { }

    The thing is, you do need the semicolon at the end of a struct or class declaration, so the compiler can see the difference between

    struct mystruc_t { int a, b; } mystruc;

    and

    struct mystruc_t { int a, b; };

  10. Dang, this shows me how rusty my C skills are… I need to get out of the java world for a while :)  I saw ~FooBar() {}; and thought it was a constructor :)  Discovered just now it was a destructor.  

    SO, I’m still not 100% clear on what the "issue" with the code stems from.  Is it the Foo method which is the problem, like I figured?

    "detect the wrapped reference to _val and realize that it expanded to an ambiguous expression. "  I think means the issue with _val += Bar(val) and Bar referencing _val in it’s method?  This code does seem "valid" although NOT good.

    Again, I’m used to Java where things are quite a bit different, and sometimes MUCH simpler.  Just ran this code in Java.  Here’s the source for it:

    public class Test

    {

           private int _value;

           public Test(int value)

           {

                   _value = value;

           }

           public void Foo(int value)

           {

                   _value += Bar(value);

           }

           public int Val()

           {

                   return _value;

           }

           private int Bar(int value)

           {

                   _value = value;

                   return _value;

           }

           public static void main(String[] args)

           {

                   Test t = new Test(42);

                   System.out.println("t.value()" + t.Val());

                   t.Foo(1);

                   System.out.println("t.value()" + t.Val());

           }

    }

    In Java, this returns a result of 43, consistently.  My understanding of how the java compiler works on this stuff (and some of this could be assumption) is that the method goes from left to right – i.e. += translates literally to _value = _value + Bar(value).  This translates to _value = 42 + Bar(1) -> _value = 42 + 1; as the _value is set before the method is called.  NOW, if I reversed the order and explicitly spelled the code out, _value = Bar(value) + _value; then Bar(value) gets called first, setting _value and return 1, and then 1 + 1 = 2.  This is consistent both in java and in g++.  Am I missing something here with regards to an error?

    Related to all of this, I was also curious on another topic.  I noticed the way you generated your source code (as said, I’m from the Java world, which is very different).  I’m kinda curious on "coding" standards, as I’ve seen so many different standards.  Usually here for example, we only capitalize the first letter of a constructor, and normal methods are lower case with capitlization differentiating words.  i.e. setValue(int value).  Further, you’re indenting your brackets, while the two standards I’ve seen place brackets like this:

    public myFunction(int whatever) {

    and

    public myFunction(int whatever)

    {

    }

    NOW, some of that I’d assume is due to me used to doing java development vs. C development. I’ve tons of different seen C "standards".  (I’ve got a "Microsoft C Programming for the IBM PC" book which uses syntax much like my java syntax).  I’m kinda curious as to what standards you guys use for coding and documentation?  What is the "correct" formatting/syntax?  Part of the reason my "syntax" might be off is I do ALL my development from a Unix shell with gvim windows (as many as 15 at once with tabbed terminal windows – gotta love iTerm!), ant build scripts, and SVN repositories for code tracking.  SO, with-in that kind of build environment, well, I don’t have an IDE forcing my code into a set format. Anyways, I guess this is all explanation for "am I doing things right?" as I’ve not been in a major development group (I’m the primary developer where I’m working now).

  11. Rick Schaut says:

    Frederik, yes, the semicolons at the end of the method definitions are optional.  I tend to add them simply out of habit, but that’s not strictly a coding error.

    Jason, there is no cannonical coding style for braces.  Use whichever style you find most comfortable and/or whichever style is dictated by the group you’re in.  I used the style we tend to use in our code, but that doesn’t mean the style I used is cannonically correct in any way.

  12. Bjarne Stroustrup is a brilliant man, which means he’s very good at defining programming languages. …

  13. Uli Kusterer says:

    @Joshua: What Stroustrup book are you referring to? My copy of "The Design and Evolution of C++" (4th printing) has chapter 6.2.2. named "How does the Committee Operate?". "The C programming language" (by K&R, second edition) has only a chapter 6.2 and it’s about structures and functions.

    I’m not sure what you’re referring to?

    @Rick: It’s an insiduous problem, but I would have thought that by now ANSI and ISO would have discovered this and standardised on one interpretation? Of course, two compilers can still have the bug for legacy reasons, but does the standard really leave this undefined?

  14. Rick Schaut says:

    Uli,  Joshua is referring to Stroustrup’s, "The C++ Programming Language."

    And, yes, the standard really does leave it undefined.  This has to do with the age-old efficiency trade-off.  Allowing compiler writers to choose the order of subexpression evaluation as they see fit enables them to take advantage of certain processor dependant issues.  The most common case involves instruction scheduling.  You’ll often see compilers changing evaluation order based on whether the operands are allocated in registers or allocated on the system stack.

  15. Schwieb says:

    Here’s a link I found that explains the difference between "implementation-defined", "unspecified", and "undefined" behavior.

    http://c-faq.com/ansi/undef.html

    Might be useful for folks who wonder why compilers can even be allowed to be so vague.

  16. Rick Schaut says:

    I hate that FAQ.  It’s clear as mud, but it covers the ground.  In particular, the interpretation of "undefined" behavior:

    "Anything at all can happen; the Standard imposes no requirements. The program may fail to compile, or it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended."

    borders on the inane, particularly when we start talking about the order of evaluation of subexpressions within an expression.  Compilers still must compile code that’s syntactically correct, and, as long as the subexpressions themselves are well defined, the compiler must evaluate them properly.  The only thing that’s undefined is the order in which the compiler chooses to evaluate those subexpressions.

    It would be infinitely more useful to say that the compiler is free to evaluate the same expression in one order in one place in the code and in a completely different order somewhere else in the code.  In fact, when we first ported code from MSVC to CodeWarrior, I ran into a case that had two successive lines of code that read:

    Swap(a, b);

    Swap(c, d);

    "Swap" was a macro that expanded to an expression that depended on the order of evaluation.   Fascinatingly enough, when I stepped over those two lines of code, the first statement behaved as expected, but the second expression zeroed out one of the arguments.

    What?!  Yup.  Turns out that the compiler had allocated one pair of variables on the stack and had allocated the other pair of variables in registers.  The order of subexpression evaluation changed depending upon where the variables in question were allocated–and it was a perfectly valid way for a compiler to behave!.