Five-Dollar Words for Programmers, Part Two: Orthogonal


In geometry, “orthogonal” basically means the same thing as “perpendicular”, or “at right angles”.  The walls are orthogonal to the floor. But algebraists extend the meaning of “orthogonal” beyond mere perpendicularity; to an algebraist, two aspects of a system are orthogonal if one can be varied without changing the value of the other. 

Imagine for instance that you were trying to describe how to get from one point in an empty room to another. A perfectly valid way to do so would be to say how many steps to go north or south, and then how many steps to go northeast or southwest.  This hockey-stick navigation system is totally workable, but it feels weird because north and northeast are not orthogonal — you can’t change your position by moving northeast without also at the same time changing how far north you are.  With an orthogonal system — say, the traditional north-south/east-west system — you can specify how far north to go without worrying about taking the east-west movement into account at all.

Nonorthogonal systems are hard to manipulate because it’s hard to tweak isolated parts. Consider my fish tank for example. The pH, hardness, oxidation potential, dissolved oxygen content, salinity and conductivity of the water are very nonorthogonal; changing one tends to have an effect on the others, making it sometimes tricky to get the right balance. Even things like changing the light levels can change the bacteria and algae growth cycles causing chemical changes in the water.

Computer people further extend this algebraic notion of orthogonality to their systems. Consider a case I just came across while studying the C# expression binding code, for example. C# warns about unreachable code:

if (false) {
label:   // targetted but not reachable
  foo(); // warning! unreachable!
  if (abc) goto label;
}

and also warns about untargetted labels:

public void foo() {
  bar();
label:  // reachable but not targetted
  blah();

“Reachable” and “targetted” are orthogonal in one sense. You can have a reachable targetted label, a reachable untargetted label, an unreachable targetted label and an unreachable untargetted label. But they are nonorthogonal in another sense, as I discovered this week when I almost seriously broke the code generator. The way our code generator works, we must generate IL for an unreachable label if it is targetted by a reachable goto. The notions of reachable and targetted get conflated in this one corner case, meaning that I cannot have a code generator that skips generating all unreachable code. This unexpected nonorthogonality makes it difficult to write a compiler that is both correct and understandable, so I’m trying to figure out ways to either eliminate the nonorthogonality, or capture the problem in a very specific piece of highly localized code that I can comment the heck out of.

Now hold on a minute, I hear you saying. Surely if the goto is reachable then the label will be too! My challenge to you guys is this: how can you have an unreachable label targetted by a reachable goto?

Comments (21)

  1. Dave says:

    Do you mean something like this?

    if ( maybe ) goto jail;

    if ( false ) {

    jail:

    GettingHereIsFun(1.0/2.0);

    }

    It seems like if a label is targetted by a reachable goto then, by the transitive property, it is reachable.

  2. EricLippert says:

    In that case jail is both targetted and reachable. I agree that it seems like a label targetted by a reachable goto ought to be reachable, but there are situations in which it isn’t, honest.

  3. Nicholas Allen says:

    try {

    goto foo;

    } finally {

    return true;

    }

    foo:

    return false;

  4. Jeremy D says:

    I think Nicholas almost nailed it, but the compiler (rightfully) barfs on returning from the finally block.

    However, with a slight modification it seems to work.

    try {

    goto foo;

    } finally {

    throw new Exception("whoops!");

    }

    foo:

    return false;

  5. Hmm. If Nicholas/Jeremy’s example is correct, couldn’t you just skip generating the unreachable label and translate the reachable goto into something else instead, like a return?

    Stuart.

  6. Nicholas Allen says:

    Jeremy-

    Thanks for the correction, I don’t have a compiler with me.

    Eric-

    Compile an unreachable, labeled basic block to a single nop. Chuck out unreachable, unlabeled basic blocks. Then, targetting is no longer a factor. That’s really easy to understand and the JIT will take care of the nop for you.

  7. Johan says:

    Eric,

    Thanks for this explanation of orthogonal. Very insightful and clear, and just so happens to be extremely helpful to what I am doing.

    -Johan

  8. Nicholas:

    First off, I’m in awe of your insight into the problem :)

    However, it seems to me that your solution just trades "reachable and targetted aren’t orthogonal" for "reachable and labeled aren’t orthogonal".

    It’s probably still an improvement (because it’s presumably much easier to determine whether a block is labeled than whether a label is targetted) but it isn’t a perfect solution to non-orthogonality…

  9. To answer my own point: the "perfect solution to non-orthogonality" is obviously just to compile *every* unreachable basic block to a single nop; then "labeled", "targetted" and "reachable" are all treated orthogonally.

    Since that would clearly lead to a ridiculous number of nops, I agree that Nicholas’s solution is the way to go :)

  10. EricLippert says:

    Indeed, the problem is a goto across a finally block with an unreachable end.

    finally{ while(true){ foo(); } }

    would also qualify.

    Right now the code generator does what Nicholas describes — it generates a "leave" instruction for the goto and then lets the IL semantics take care of the leave being hijacked by the finally. That means that the leave has to leave _to_ somewhere, so we generate the label’s basic block even if it is unreachable. Of course we discard all of the other unreachable code after the label, so its just a no-op.

    I am considering changing the code generator as you guys describe, so that it detects this situation and changes the goto into a branch to the beginning of the finally or some other appropriate location. However I’m not sure whether the amount of changes I’d have to make to the code are worth this particular optimization — I risk making the code more complex just to eliminate what is admittedly a wart in the unreachable-code-trimming algorithm.

  11. Nicholas Allen says:

    Eric-

    I don’t quite follow what you’re proposing to do. It shouldn’t be possible to branch out of a try block. You can leave out of a try block, but it shouldn’t be possible to leave into a finally block. So where would you leave to? The only guaranteed safe point I can think of is the first instruction of the original try block itself. But, and this seems extraordinarily shady implementation-wise, if you leave back to the start of your own try block, the finally blocks aren’t executed.

  12. EricLippert says:

    Right — no matter what happens, that try block has got to be left somehow, and its got to go somewhere, so we’ll probably be generating a no-op somewhere for it to go, and we’re back in the situation that we’re already in. My musings are more along the lines of whether there’s a cleaner way to represent this situation in the implementation. I’m working on code that removes the unreachable portions of the bound tree earlier and its a conceptual pain in the rear that I can’t remove unreachable targetted labels. I think though that I’m going to live with it since as you say, there’s not really a good alternative.

  13. Could you just convert unreachable code *always* to nops in your "early" pass through the tree, and then have a pass later that removes unnecessary nops?

  14. EricLippert says:

    That’s essentially what we do. Code that is unreachable is stripped totally from the tree before code generation, except for targetted labels which are turned into nops. We then do an additional reachability check on the generated basic blocks and remove unreachable untargetted blocks.

  15. Referring to

    try {

    goto foo;

    } finally {

    return true;

    }

    foo:

    return false;

    I do not see how the "goto foo" is reachable. It’s return is obviously overridden by the finally clause.

    Considering the Finally clause as a subroutine, and having it called before all exit points in the try and catch clauses, should make the reachability analysis trivial.

    A quick inspection of Soot code, JimpleBodyBuilder::createTryCatchFinally(), appears to do just this.

  16. EricLippert says:

    First off, as mentioned above, that code is not even legal C#. Returning from a finally is forbidden in C#.

    But suppose it were legal. Then the reachability analysis would be (assuming that the try is reachable)

    goto foo — beginning is reachable, end is unreachable

    return true — beginning is reachable, end is unreachable

    foo: — label is targetted but unreachable

    return false — unreachable

  17. John says:

    If you can’t remove unreachable targetted labels because they are targeted, why don’t you just remove the code which targets the label too? If the label is truly unreachable, then code which targets it is redundant.

  18. It must be CLR week over at The Old New Thing because it’s been non-stop posts about C# lately. Raymond’s

  19. Last time I mentioned that one of the subtleties of programming language design is weighing the benefit

  20. Do you know that signals can be represented as vectors. Infact they are vectors. And hence the Fourier series and transform are applicable because of the property of orthogonality. Its the beauty of science. I read communication books quiet often and am quite baffled by the way the writers use the concept of orthogonality there.  

    Indeed, I studied a fair amount of linear algebra during my undergraduate degree. It has been a long time since I looked at the transforms from the signal space to the frequency space though. I remember hardly any of it.

    But indeed, the fact that one can form an orthonormal basis for signals is the key to the decomposition of the signal. — Eric

  21. Ben Tremblay says:

    So … thank you. Now I can refer to this post when pointing to how my discourse / deliberative system (It’s dialectical … and dialogical, too!) is <i>orthogonal</i> to canonical forum software. I.e.: "<b> two aspects of a system are orthogonal if one can be varied without changing the value of the other</b>"

    consider yourself tweeted

    –bentrem

    p.s. yes, my link points to something substantial