The Consequences of Performance Optimizations


I wanted to write a post today about a very interesting bug we just came across related to Refactoring.  As i just blogged
about verification, i thought that this would be a good followup. 
First, a bit of background.  Referencing the previous post, we can
see the following Q/A about refactoring:

Why is it important? Well, if we look at Martin Fowler’s Refactoring Site, we’ll find the following information:

Q: What is Refactoring?

A: Refactoring is a disciplined technique for restructuring an existing
body of code, altering its internal structure without changing its
external behavior.

Now, let’s dissamble that “answer”.  What does “without chaning
its external behavior” mean?  Well, to me, it means that the
semantics of the code (as defined by the language) have not been
changed.  Why do i limit myself to just “as defined by the
language”?  Because, frankly, any other definition would be too
hard to implement 🙂  For example, if you perform an “extract
method” of some code that sits in a tight loop, and that the jit
doesn’t inline, then you’ll have changed the external behavior with
respect to its performance.  However, there’s no good way to
detect this, and most people don’t care.  Similarly, if your code
makes decisions based on runtime type information (i.e. it uses
reflection), then it’s possible to change it’s external behavior
because suddenly this alteration of internal structure becomes
visible.  However, customers have spoken and have stated that
they’re fine with these caveats and so we accept them. 



Now, here’s where it gets a bit interesting.  Let’s look at my
original definition: “it means that the semantics of the code (as
defined by the language) have not been changed.”  Why is this
interesting?  Well, consider the case of code with errors in
it.  The language defines such code to be semantically
meaningless.  So, at that point, almost any change can be made and
still be called a refactoring since the semantics won’t change (i.e.
meaningless code stays meaningless).  In fact, the only change a
refactoring could not do would be to take this broken code and make it
non-broken (since that would change it’s semantics from meaningless to
meaningfull).  Early on in VS2005 we considered making it so that
you could only run refactorings if your code was compilable.  We
felt that this went with the spirit of refactoring, and also, it made
the refactoring job orders of magnitude easier since trying to make
sense out of broken code is extraordinarily difficult (my full time job
in fact!).  However, based on a huge amount of customer feedback,
we felt that that was a bad idea.  Customers still wanted to be
able to do things like extract a method even if their code was
broken.  And they were willing to deal with the fact that the
result might not be perfect if the code wasn’t in a good shape. 
However, they just wanted to be made aware that this might be the case,
and so we decided to pop up a warning every time you did a refactoring
of unbuildable code.  Specifically:



            




Seems pretty sane right?  We give you the power of a lot of code
manipulation tools at your fingertips, but we warn you that we might
not always be able to do everything correctly.  And because we
have a preview mechanism you can go in and see what we’re going to do
to decide if it meets your needs.



So what’s the problem?  Well, based on quite a bit of customer
feedback (as well as a nasty bug that i opened myself), we decided to
invest a lot of time on improving the performance of
refactorings.  We felt that they were far too slow, and that that
would limit their usefulness.  At this point i feel that they’re
still slower than we’d like, however fast enough to be used as part of
the regular development cycle.  For example, of my fairly meaty
project that i work on, they take on the order of 2-3 seconds. 
I’d like them to be instantaneous, but it’s far better than the 2-3
*minutes* that it used to take!  (Note: it’s a known problem that
the more projects you have, the more degradation you will get, we’ll
work on that more in the future).  The performance optimizations
implemented were several fold.  First off, we realized that we
were just duplicating a whole lot of work, and that it was trivial to
just make sure we were only doing the work once instead.  However,
we also added what we thought were some pretty snazzy heuristics to
improve performance.  For example, if you’re renaming a variable,
then we will not compile any methods in your source code that don’t
contain the identifier you’re renaming in them.  (The C# compiler
does two passes, the type system outside of methods, and the
compilations of individual methods one by one).  This latter
optimization was a huge savings in performance since it meant that, for
the most part, 99% of the methods in your code did not need to be
compiled in order to do a rename. 



But, in our haste to make fast, we missed a critical issue.  If we
don’t compile those methods, then we won’t know if there are errors in
them.  And if we don’t know if there are errors in them, then we
won’t give you that dialog.  And now the user can get enormously
confused.  Why do i get this dialog in some cases and not in
others?  Technically the dialog now means “The source in your
project related the refactoring you’re performing does not currently
build”.  However, i think that that’s enormously subtle, and might
lead users to be nervous about the validity of our analysis and
verification steps.  So much so that i’m wondering if we should
undo this performance optimization.



How do you feel about this?


Comments (28)

  1. CN says:

    Well, what about a change in the error message itself that somehow indicates that not everything is rebuilt, but only relevant parts, like: "A part of this project, or one of its dependencies, possibly relevant to the refactoring, does not currently build."

    Yeah, that’s a bit long, but you get the idea. It should be no surprise that the code doesn’t always touch everything, but it would be good if that error message told us so, to avoid the confusion you suggest.

  2. Peter Waldschmidt says:

    If the refactoring performance change is that large, you have no choice but to leave it in. 2-3 minutes is intolerable for a refactoring operation in my opinion. Just add more detail to the message if needed, or point to a page in the help docs with even more explanation.

  3. Leave the performance improvement in please.

    Mark

  4. Haacked says:

    I’m surprised there is such a big demand to allow refactorings on non-buildable code. Further down in the description, Fowler writes:

    The system is also kept fully working after each small refactoring, reducing the chances that a system can get seriously broken during the restructuring.

    The whole point of not changing the semantics is to improve the design of working code. Typically, after a refactoring, you run your unit tests to make sure they pass.

    Having said that, are there ways that you can find a middle ground. Perhaps allowing some refactorings on broken code (extract method comes to mind), but not on others (rename variable could be problematic on broken code)?

  5. CyrusN says:

    Haacked: "I’m surprised there is such a big demand to allow refactorings on non-buildable code."

    Well, consider our users. I think they actually want two things. One would be "real refactorings" and the other would be "useful large scale code modification features".

    "Further down in the description, Fowler writes:

    The system is also kept fully working after each small refactoring, reducing the chances that a system can get seriously broken during the restructuring.

    The whole point of not changing the semantics is to improve the design of working code. Typically, after a refactoring, you run your unit tests to make sure they pass. "

    If a refactoring does everything right, you shouldn’t need to run any unit tests. The only reason to run unit tests would be if your code is dependent on things beyond the semantics of the language (i.e. performance, or RTTI).

    "Having said that, are there ways that you can find a middle ground. Perhaps allowing some refactorings on broken code (extract method comes to mind), but not on others (rename variable could be problematic on broken code)? "

    Yeah. It’s something taht will need lots of thought. 🙂

  6. Daniel says:

    I think the error message you’re suggesting ("The source in your project related the refactoring you’re performing does not currently build") makes perfect sense, and I vote to leave the performance optimization in.

  7. Kevin says:

    I agree whole-heartedly with the other comment posters; leave the performance improvements and re-write the dialog message.

    I would be happy to trade the possible uncertainty in broken code for faster refactoring in 99% of cases….

  8. DrPizza says:

    Eclipse doesn’t warn in every single refactor-with-errors scenario, and it’s never bothered me. After all, in this scenario it doesn’t matter that other bits of code can’t compile, because the refactoring can’t affect them anyway.

  9. DrPizza says:

    "Having said that, are there ways that you can find a middle ground. Perhaps allowing some refactorings on broken code (extract method comes to mind), but not on others (rename variable could be problematic on broken code)? "

    Absolutely not. They must all be available all the time. The only even remotely acceptable reason to prevent refactorings is when the code is unparsable. If it’s syntactically invalid then preventing refactorings is tolerable. If it’s syntactically valid–even if wrong–refactoring should be permitted.

    The basic reason is that often when refactoring I’m performing several changes *only some of which the IDE can help me with*. Because I have to make changes *by hand* (because they’re ad hoc changes) I’m often left in the situation that the code doesn’t yet compile–because I’m in the middle of something. I still want to use the automated refactorings in this situation, because they’re a part of the broader refactoring that I’m doing. If you deny the ability to use them you might as well get rid of them all together, as far as I’m concerned; I’d rather have nothing at all than something that’s taken away from me at the whims of the development environment.

  10. DrPizza says:

    BTW, if you have to explain what each button’s label actually means in the message box then this is a strong indication that you’ve mislabelled the buttons.

    What about something like "Continue Refactoring", "Preview Changes", "Cancel" and no explanatory text?

  11. DanT says:

    I suggest a user option. If they have never set it, when they do a refactoring, ask "Do this quickly but don’t check unreferenced code", or "DO this safely but slow." Store that in the refactoring options pane.

  12. Sean Chase says:

    Cyrus, I use ReSharper right now and I get annoyed with waiting for the assemblies to get parsed when the project first loads. Keep the perf improvement and change the dialog box. DanT also has an interesting idea, but I wouldn’t personally care as much about that option as having everything work fast.

  13. Jeff Parker says:

    I am pretty much in the camp of refactoring should be done on workable compilable code. I mean your not refactoring it really if the code isn’t compilable. Your at the point where your still writing it. Refactoring I guess means a lot of different things to people. Who would have thunk it. I look at refactoring also kind of like a programming methodology in a way kind of like agile or extreme programming. I mean there have been books upon books written on refactoring.

    Now if Microsoft built something like extreme programming into visual studio there would be hundreds of people complaining. Not because Microsoft didn’t follow the rules of extreme programming in fact I am sure they would, but because it doesn’t quite fit some peoples needs and wants. Well that is simply because maybe that methodology doesn’t fit their needs or they do not understand the methodology. The methodology is there for a reason, and it works great for some people other it doesn’t. I say leave the error in, compile the code. Maybe some of the people complaining about it should take some time to learn the rules of refactoring.

    Easy to say for me though you have to make a customer choice decision. What would worry me is if you try to apease those they do not really have an understanding for refactoring, then you break it and make it useless for those that do.

  14. andy says:

    Please leave the optimization in.

  15. DrPizza says:

    "I mean your not refactoring it really if the code isn’t compilable. "

    Rubbish. The code may not be compilable because you’re simply part way through restructuring it. In such situations, why should you be denied IDE assistance?

  16. Schneider says:

    Leave the performance improvement in, but

    if they want to know, do the 2-3 minute compile to give them the feedback.

    Thoes who want it wait……

    Eric Schneider

  17. Jeff Parker says:

    "Rubbish. The code may not be compilable because you’re simply part way through restructuring it. In such situations, why should you be denied IDE assistance?"

    You shouldn’t be denied assistance. You can still copy and paste and use find and replace. Those things all still work but if your hacking your code to bits from a compilable state into a uncompilable state, your not really refactoring now are ya? Your more or less just wanting the IDE to try to figure out what the heck to do with broken code. That is not what refactoring is.

    http://en.wikipedia.org/wiki/Refactor#Refactoring_code

    Refactoring basically is another step in writing software. It should be done after your code is working to improve how it works. Like I said refactoring is not copy and paste, edit and replace, and try to help me fix these bugs. It shouldn’t be considered a tool into writing or working with code that doesn’t compile. It was never intended to do that. Everything that can be done under refactoring doesn’t aim to break code. Its main goal is to improve the preformance and readability or functioning code. Look up refactoring and see noe of the Key phrases from Martin Fowler. "The system is also kept fully working after each small refactoring, reducing the chances that a system can get seriously broken during the restructuring."

    Refactoring is meant to be done on working code plain and simple. Sorry if you do not like it but that has been the purpose of it for years. I think Microsoft adding refactoring into VS 2005 is a very nice thing I think they did it very effectively so don’t break it for the rest of us that apreciate the job they did. There is nothing stopping you from writing a macro or your own tool to do what you want. Refactoring in VS 2005 isn’t broken so don’t fix it.

  18. DrPizza says:

    "You shouldn’t be denied assistance."

    But you just said I should.

    "You can still copy and paste and use find and replace."

    But I may want to use "rename variable" or "extract method" or…

    "Those things all still work but if your hacking your code to bits from a compilable state into a uncompilable state, your not really refactoring now are ya?"

    Don’t be so absolutely ridiculous.

    Consider the "rename variable" refactoring. When you invoke that the compiler will (behind the scenes) rename each reference to the variable in turn. When it’s only part way through this process, the code will be in an uncompilable state (some references will have changed, others won’t; these latter references will cause compile-time errors). Would you claim therefore that the compiler’s rename variable does not in fact constitute a refactoring?

    No, of course you wouldn’t, because you recognize that the simple task of editing code can leave it in a state that, temporarily, cannot compile.

    I’ve often come across situations where Eclipse’s "extract method" refactoring can’t work (because for example I need to make a transformation from a variable mutation to a return and assignment at the point of the call). Semantically, the operation I want to perform is still "extract method"–I just have to extract it by hand. Now, because I can’t edit code instantaneously, whilst performing this extraction the code is often in an uncompilable state. For example, in the gap between writing a method declaration and populating its body. However, when doing this, I’ll often at the same time want to rename a variable or argument so that it makes more sense given the new code structure.

    In your world, I shouldn’t be allowed to use the IDE’s refactoring support when doing this; because my code is temporarily broken (because I’m mid-way through the refactoring I wish to perform) I should have to do everything by hand. That’s completely idiotic. There’s literally no advantage to the restriction you propose. I’m still wanting to perform a refactoring.

    "Your more or less just wanting the IDE to try to figure out what the heck to do with broken code. That is not what refactoring is. "

    I’m wanting the IDE to make changes that don’t alter the semantics of the code for me. That’s what refactoring is. I’m wanting it to make these changes as part of a larger refactoring effort that I’m performing, because often it doesn’t have a built-in mechanism for doing exactly what I want. There is not a single good reason to deny me the ability to do so, and your repeated parrotting of Fowler doesn’t change that. In your distorted world-view a manual variable rename or method extraction would not constitute a "refactoring" because of the interim stages in which the code doesn’t compile. I’m hard-pressed to come up with a single useful change one could make where this wouldn’t be the case. Your position is completely without any kind of merit or virtue, as it makes any and all refactoring impossible.

  19. CyrusN says:

    DrPizza: You’re unecessarily making discussion difficult. I would appreciate you being civil here and helping to promote a better atmosphere in general.

    " "You shouldn’t be denied assistance."

    But you just said I should."

    You can most certainly be denied "refactorings". However, that doesn’t mean that you can’t be offered "code manipulation tools" which don’t meet the definition of a refactoring. In essense, that’s what we’re providing here. There’s raw textual manipulation support. There’s rich support for manipulating code correctly when it’s in a buildable state. And there’s so-so support for dealing with non-compilable code. We recognize the benefits of each approach, but it’s certainly reasonable to suggest that we only provide true "refactorings" when the code is compilable. As Jeff has said, that’s pretty much a requirement as per the definition of a refactoring.

    ""You can still copy and paste and use find and replace."

    But I may want to use "rename variable" or "extract method" or…"

    Then we’d provide: "fuzzy rename variable" which may or may not do what you intend if the code is not buildable. Similarly with "fuzzy extract method"

    ""Those things all still work but if your hacking your code to bits from a compilable state into a uncompilable state, your not really refactoring now are ya?"

    Don’t be so absolutely ridiculous."

    He’s not. He’s being correct. If your code isn’t compilable, then you’re not actually performing a refactoring. And, as you can see here, there have already been a couple of people who would not have a problem with only allowing refactorings of correct code.

    "Consider the "rename variable" refactoring. When you invoke that the compiler will (behind the scenes) rename each reference to the variable in turn. When it’s only part way through this process, the code will be in an uncompilable state (some references will have changed, others won’t; these latter references will cause compile-time errors). Would you claim therefore that the compiler’s rename variable does not in fact constitute a refactoring?"

    Refactorings are treated as atomic actions. During such actions like actual text substitution you can’t perform another action (like a refactoring), so the distinction is meaningless.

    "No, of course you wouldn’t, because you recognize that the simple task of editing code can leave it in a state that, temporarily, cannot compile."

    Yes, and in the refactoring world it would be understood that during this uncompilable time you would be unable to perform refactorings.

    "In your world, I shouldn’t be allowed to use the IDE’s refactoring support when doing this; because my code is temporarily broken (because I’m mid-way through the refactoring I wish to perform) I should have to do everything by hand. That’s completely idiotic. There’s literally no advantage to the restriction you propose. I’m still wanting to perform a refactoring."

    No, there is another alternative which you didn’t consider. Provide enough flexibility and power with the refaactorings so that they can handle these cases. You mentioned the problem with eclipse’s extract method. Consider if they were smarter about that. Or if they had more refactorings so you could do the variable mutation first through a refactoring, and then do the extractions afterwards.

    ""Your more or less just wanting the IDE to try to figure out what the heck to do with broken code. That is not what refactoring is. "

    I’m wanting the IDE to make changes that don’t alter the semantics of the code for me."

    That statement is meaningless in the context of uncompilable code (as i stated in teh preamble). There are no semantics, just guesses.

    "That’s what refactoring is. I’m wanting it to make these changes as part of a larger refactoring effort that I’m performing, because often it doesn’t have a built-in mechanism for doing exactly what I want. There is not a single good reason to deny me the ability to do so, and your repeated parrotting of Fowler doesn’t change that."

    Yes it does. It pushes for a system whereby developers strongly adhere to the philosophy of refactorings, while simultaneously pushing tools developers to provide more refactoring flexibility.

    "In your distorted world-view a manual variable rename or method extraction would not constitute a "refactoring" because of the interim stages in which the code doesn’t compile. I’m hard-pressed to come up with a single useful change one could make where this wouldn’t be the case. Your position is completely without any kind of merit or virtue, as it makes any and all refactoring impossible."

    No it doesnt’. As exercises we’ve taken portions of our code base and performed refactorings on them as strictly defined by the refactoring texts and we never ran into the problems that you specified. You just do each refactoring atomically. Test it. Check in.

  20. DrPizza says:

    Your claims of atomicity are complete bunk. "Refactoring" existed long before refactoring-capable IDEs existed. Atomicity is in no way inherent to the definition of refactoring, which is probably why Fowler’s definition doesn’t say anything about it.

    Any restructuring which doesn’t alter the external semantics of a program is a refactoring, whether it be manual (done piecemeal by the programmer) or automatic (done "atomically" by the IDE; bunny ears because IDEs such as Eclipse certainly don’t do it atomically; refactoring saves modified but unopened documents, but this saving is not atomic, and if the IDE dies part way through, you’re left with… uncompilable code; I don’t know if VS.NET behaves the same, but it wouldn’t surprise me in the least).

    This claim that the refactoring is something that must be atomic and must not leave the code in an intermediate uncompilable state is an invention that’s completely disconnected from reality.

    The other thing that you miss is that when using the IDE to assist in these manual refactorings, the "rename variable" (or whatever) isn’t the refactoring any more; the refactoring is what *I’m* doing. The IDE is just helping me with that task. There’s no way I can do each step atomically, because your IDE simply *isn’t good enough*. It’s not up to the job. There’s no way to get to my desired destination (which is restructured but has identical external semantics so is by definition a refactoring of my current position) without having intermediate uncompilable steps.

    If your IDE were perfect and could do what I wanted all by itself evey time then maybe I’d change my view. But it’s not. It’s not even close to perfect. As such any restrictions of the kind you propose–restrictions that assume things can be done perfectly–are completely unacceptable.

  21. Jeff Parker says:

    Well anyway, Cyrus the primary reason I am against doing refactoring against uncompilable code is specifically what you said it is all guess work. This has proven time and time again to be the wrong thing. There are a few examples of this I can point out and how it has come back to haunt Microsoft over time and how it has hurt developers in the long run.

    Internet Explorer / Quirks Mode – These guys take more heat on this that anyone basically they are guessing what they think you want your html to do. Some people blame them for breaking the web.

    Another one was VS 2002/2003 the html parser. You flipped to design view and all your html was changed. There are blog postings about this forums postings bug reports.

    I just do not think it is a good idea to attempt guessing again. Somewhere down the line you will upset someone by guessing wrong. Like I said I am perfectly happy with refactoring how it is. If like Dr Pizza says I need to rename a variable in broken code I do not have a problem using find and replace. I guess I also have a hard time thinking your code could be so broken as not to compile, execute yes, but not compile.

    I generally am not more than one or two lines of code away from a compilable state. This doesn’t mean executable state, but compilable state. These would be the one or two lines I am working on. I guess that come from years of also writing Ada code on a Vax system when all the compiler would do would be to tell Syntax Error: you are missing a ;. Yeah well it would have been great if it would have told me where given me a line number or something. On top of that it might not have been the problem in the first place. I guess that why I am still in the habit of hitting compile now days several times a hour actually for simply no reason at all just to make sure things are all ok. Spent too many long night looking through lines of Ada code looking for the missing Semicolon.

  22. Christian says:

    For heaven’s sake: Leave the performance improvement in it!!

    I don’t care if it is true refactoring or fuzzy-something: I want to use this feature while writing new code and often I just stopped in the middle of a line just to rename something.

    I even turned the question whether I really want to perform a rename OFF, so that it will always do it’s best.

    And if something was forgotten, I will notice the next time I build (which happens about every 5 minutes and is much too slow for all but extremely tiny projects)

    Please leave it inside. It would be a major bummer to wait 2-3 minutes.

  23. DrPizza says:

    "If like Dr Pizza says I need to rename a variable in broken code I do not have a problem using find and replace."

    But using find and replace is undesirable in this situation for the same reason that it’s generally undesirable; it’s not smart enough. If you’re happy with that then you haven’t set your standards high enough.

  24. chris says:

    If refactorings only work on a compilable project they will be mostly useless to me. My primary project is not built (or buildable) through VS.

  25. Cleaning-out my “To Blog” file again…

    Architects

    Handling data in service oriented systemsEdward…

  26. loc says:

    I haven’t read the whole discussion, but instead of undoing the performance optimization, can we give users 2 options (compile_all and fast_compile) when refactoring?

    i think this would be a good temporary solution until a better one is found.

  27. loc says:

    now i can brainstorm of several workarounds for this issue, man. but the option above is definitely useful, the more i think about it.

  28. Cyrus, to answer your question, leave the perf optimization in. I think the dialog is near useless anyway as I’ve already become accustomed to dismissing it w/o reading.

    More importantly (to me, at least), please, please, PLEASE spend some time optimizing refactoring performance with relation to Web projects. I *wish* I could have refactorings that *only* take 2-3 seconds!