Not as easy as it looks, Part Two


Holy goodness, did you guys ever find a lot of additional ways in which an “eliminate variable” refactoring can go wrong. Just a few of your observations: (again, in every case, “x” is eliminated.)

Any situation in which x is being treated as a variable rather than a value will pose a problem. Some obvious ones, like if x is on the left side of an assignment, or the target of a ++, or used as a “ref” or “out” parameter are straightforward. But there are some situations in which it is a bit surprising where x is being used as a variable. Suppose S is a mutable struct with field F:

S x = new S();
x.F = 123;

Here we cannot simply say (new S()).F = 123 because new S() is not a variable. The mutable field requires a storage location to mutate but we don’t have a storage location here.

A number of people pointed out that the C# rules about how simple names may be used potentially pose problems:

int y = 2;
void M()
{
  Action x = () => { Console.WriteLine(y); };  // refers to this.y
  {
    string y = “hello”;
    x();
  }

The two inconsistent uses of the simple name y are legal here because the declaration spaces in which both are first used do not overlap. But the refactoring causes them to overlap; this needs to be refactored to

void M()
{
  {
    string y = “hello”;
    ((Action)(()=>{Console.WriteLine(this.y);})))();
  }

It gets even worse if the inconsistent simple names cannot be fully qualified:

Action x = () => {int z = 123;};
{
  string z = “hello”;
  x();
}

Now one of the local zeds must be renamed if the refactoring is to succeed.

You can also get up to mischief with anonymous types:

int x = 42;
var a = new { x };

This must be refactored to

var a = new { x = 42 };

Similarly,

int x = k.y( );
var a = new { x };

cannot be refactored to

var a = new { k.y() };

because that changes the name of the anonymous type property.

Moving an expression around can also muck with the definite assignment rules; this would have to be illegal to refactor:

void Foo(out int b)
{
   int x = b = R();
   if (Q()) return;
   doSomething(x);
}

because it becomes

void Foo(out int b)
{
   if (Q()) return;
   doSomething(b = R());
}

And now we have a method that returns without assigning the out parameter.

I already mentioned that moving an expression around can change the semantics of a program because the order of observed side effects can change. A number of horrid ways in which methods become dependent on side effects were noted, the nastiest of which was:

int x = LocksFoo();
lock (Bar) { return M(x); }

The refactoring changes the order in which Foo and Bar are locked, and therefore might cause a deadlock if other code depends on Foo always being locked before Bar.

Array initializers are a kind of expression that is only legal in a small number of situations; the refactoring has to take that into account:

int[] x = {0,1};
Console.WriteLine(x);

needs to be refactored into

Console.WriteLine(new int[] {0, 1});

And finally, moving an expression around can move it from a checked to an unchecked context, or vice versa:

int zero = 0;
int one = 1;
int x = int.MaxValue + one;
Console.WriteLine(checked(x + zero));

The naive refactoring turns a program which succeeds into one which fails at runtime:

Console.WriteLine(checked(int.MaxValue + one + zero));

To be correct, the refactoring would have to introduce an unchecked block around the first addition!

Thanks all, that was quite amusing.


Comments (21)

  1. Thomas Levesque says:

    So much pain for so little benefit… Probably not worth including this refactoring in the next version of VS !

  2. Will says:

    @Thomas – I assume you're not a Resharper user?

    The funny thing about refactoring tools is how hard is it to see why they're useful until you have them.

  3. SSLaks says:

    I think you mean new { x = 42 }

  4. Brennan says:

    Having so much edge cases increases the value as well.  If refactoring manually I have to notice all these issues and account for them myself every time.

  5. configurator says:

    I think there's one too many closing parenthesis in

    ((Action)(()=>{Console.WriteLine(this.y);})))();

    I thought at first this was a new case where we need to add more than ((Type)(Value)), but it isn't.

  6. Alexander Morou says:

    Not just amusing but it makes you wonder why you'd ever try to erase a variable.  Even with one use point you'd be asking for the above headache, but in the case where there's multiple use-sites you'd end up with a clear answer: the variable must remain because: the solution to introduce a variable free alternative would … introduce a variable to ensure the side-effects maintain consistency where applicable.

    Example:

    int x = StateChanger();

    int y = x + 1, z = x – 2;

    becomes

    int y = (StateChanger() + 1), z = (StateChanger() – 2);

    The issue with that is obvious.

    In the case of:

    var x = new S();

    x.F = 123;

    You'd end up using:

    new S() { F = 123 };

    Which is the same, but … object initializers introduce a background variable to do their work.  Ironic.

  7. Toni Petrina says:

    Instead of finding out what are "fail" cases for such refactoring there should be "whitelist" of allowed refactorings. That should be easy enough to do.

  8. Jeroen Mostert says:

    @Toni: if your tool simply says "scenario is not in whitelist, cannot refactor", then what do you do? Refactor manually, taking very good care of all of the above gotchas. You're bound to miss the more subtle ones. Taking care of complicated gotchas in a systematic fashion is just what a computer program is good for. Even a tool that says "scenario is of the form X, cannot refactor" because it only detects and doesn't know how to correct is more useful.

  9. Reordering says:

    I find this feature really nice, but what I would really like to have is a class reordering feature, like the "Organize using –> Sort using"  but for a class, to reorder it's fields, properties, events, methods and constructors and at the end have the class organized with the same structure of the class diagram.

  10. Weeble says:

    I guess I'm a bit late to the party. Did anybody mention closing over a loop variable?

    Func<int>[] fs = new Func<int>[10];

    for (int i=0; i!= 10; ++i)

    {

    int x = i;

    fs[i] = ()=>x;

    }

    Console.WriteLine(fs[7]());

    This prints "7", but if you apply the naive refactoring it prints "10".

  11. DRBlaise says:

    One other variant of the "simple names" scenario dealing with static variables:

    class C

    {

     static int y = 2;

     void M()

     {

       Action x = () => { Console.WriteLine(y); };  // refers to C.y

       {

         string y = "hello";

         x();

       }

    Refactor x:

    class C

    {

     static int y = 2;

     void M()

     {

       {

         string y = "hello";

         ((Action)(()=>{Console.WriteLine(C.y);}))();

       }

  12. configurator says:

    DRBlaise: Static variables can still be disambiguated although not as easily – B.y here would disambiguate just fine.

  13. configurator says:

    Weeble: That's a good point! Nobody mentioned that…

  14. configurator says:

    Reordering: But that would destroy the natural order of a class… Take this example, to make things clear.

    Class RectangleCalculator has a public interface which accept a Rectangle, two Points or four integers

    public int Area(Rectangle r);

    public int Area(Point p1, Point p2);

    public int Area(int x1, int y1, int x2, int y2);

    public int Perimeter(Rectangle r);

    public int Perimeter(Point p1, Point p2);

    public int Perimeter(int x1, int y1, int x2, int y2);

    Each of these does the argument validation, and then they delegate to the private CalculateArea and CalculatePermiter.

    Wouldn't it make sense to order the class like this?

    public int Area(Rectangle r);

    public int Area(Point p1, Point p2);

    public int Area(int x1, int y1, int x2, int y2);

    private int CalculateArea(Whatever parameters);

    public int Perimeter(Rectangle r);

    public int Perimeter(Point p1, Point p2);

    public int Perimeter(int x1, int y1, int x2, int y2);

    private int CalculatePerimeter(Whatever parameters);

  15. Reordering says:

    At the beginning the class is good organized and the class entropy is very low. Then a few changes are needed, which are added somewhere in the class structure, but still the class is good organized, but those few changes begin to raise the class entropy. After a couple of years of "few changes" the class entropy is so high that it would take too much time to reorganize it and the risk of a copy paste error is concrete. This means the class has already overcome the point of no return, the entropy now will only grow higher, it's too late, bye bye good organized class, welcome tasty spaghetti class, the can of worms is open and nobody will close it.

  16. Weeble says:

    (Sorry if this is a double-post. The first attempt to post seemed to do nothing at all.)

    Resharper does member reordering. I learned this the hard way.

    At a previous company we had a moderate-to-large size of code-base with maybe 20 to 30 developers. We were building a game, but we had lots of associated command-line programs too. At some point the command-line programs

    all

    started

    writing

    their

    –help

    output

    like

    this.

    It seems that nobody noticed or cared for a while, because they rarely invoked the help. I only noticed when I was adding a new command-line tool. It turned out that the help tool, for reasons best known to the original author, used P/Invoke to determine the dimensions of the console (rather than, say, Console.BufferWidth) in order to do word-wrapping. It called GetConsoleScreenBufferInfo, for which it needed to define the structs CONSOLE_SCREEN_BUFFER_INFO, SMALL_RECT  and COORD. More recently, somebody had applied what was *supposed* to be a global clean-up using Resharper in order to bring things like spacing, identifiers and comments in-line with our newly agreed coding style. Unfortunately this clean-up also alphabetically sorted the members of every class, and somehow this wasn't noticed when it was checked in. Since in C# structs default to LayoutKind.Sequential, the original code was correct, but alphabetically sorted it became gibberish.

    I removed all the P/Invoke and replaced them with Console.BufferWidth, grumbled at the people who had re-ordered everything, and wondered why anyone would even *want* to sort their members alphabetically. It makes about as much sense to me as ordering a music collection by the colour of the album art.

  17. configurator says:

    Reordering: if that happens to you, you're doing it wrong.

    That said, if you do have a method to reorder a class automatically to an order that doesn't make sense, it still wouldn't help you…

  18. Reordering says:

    Perhaps, still it could be a starting point, I stopped to use resharper because of the useless comments of the type "make a method static" or "don't use the "this" keyword" and moreover it had some bugs which make it worst than the built in functions of visual studio, like the find all references, it was replaced with a resharper method which for some reason didn't find all references.

    But at this point I would suggest to remove also the "sort and remove the using" option from VS because it could order or remove them in an way that somebody doesn't like, so because somebody doesn't like it nobody should use it, correct?

    And for me make more sense a class ordered like a in the class diagram than a spaghetti class:

    public class MySpaghettiClass

    {

      public void OnSpathettiAreUsed(){TheSpaghettiAreUsed(this,EventArgs.Empty);}

       public void GetSomeSause() {….}

      public event EventHandler TheSpaghettiAreThere;

     private int howManySpaghetti=0;

     public string TheName {get; set;}

     pubclic void IsSomeSpaghettiThere(){ …}

     public MySpaghettiClass() : this(11) {}

    public int HowManySpaghettiAreLeft(){return _spaghetti;}

    public event EventHandler TheSpaghettiAreUsed;

     public int HowManySpaghetti{ger{return howManySpaghetti;} set{ howManySpaghetti=value;}}

    public MySpaghettiClass(istring name) {TheName=name;}

    public void UseSpaghetti(int spaghetti){howManySpaghetti-=spaghetti; OnSpaghettiAreUsed();}

    ….

    }

  19. configurator says:

    @Reordering: How would a class get that way anyway? Unless you just drop code off at the end of the class when you need to add anything, which is a very bad practice, the class should be organized according to whatever order it is that you like them to be, which would normally not be alphabetic or according to accessibility.

  20. Reordering says:

    Classes get in this state because after x years y people have worked on them and they have added code somewhere.

    You get committed to extend an existing application with new features, the application is already 4 years old, you receive the code and it looks like this one, what do you do? Do you think to refactor it? No, absolutely not! It's not your duty, nobody asked you to do it and you won't lose your time doing it, you are paid to add new feature not to reordering a class, moreover, as already said, it's really dangerous reordering a class by hand, the risk of a copy paste error is very high.

    At the end you add your code at the end or at the top of the class, and so will do the next one who has to add some new code. At this point have a function to reorder classes could be really  useful, at least you get a better structure than a tasty spaghetti class.

  21. Shuggy says:

    @Reordering.

    Ugh that's awful. Either work through it to understand what was the connections are and understand it/split it up/ sort it or leave it as is (low risk, no nasty to merge/annotate changes)