If you’re going to reformat source code, please don’t do anything else at the same time

I spend a good amount of my time doing source code archaeology, and one thing that really muddles the historical record is people who start with a small source code change which turns into large-scale source code reformatting.

I don't care how you format your source code. It's your source code. And your team might decide to change styles at some point. For example, your original style guide may have been designed for the classic version of the C language, and you want to switch to a style guide designed for C++ and its new // single-line comments. Your new style guide may choose to use spaces instead of tabs for indentation, or it may dictate that opening braces go on the next line rather than hanging out at the end of the previous line, or you may have a new convention for names of member variables. Maybe your XML style guidelines changed from using

<element attribute1="value1" attribute2="value2" />



Whatever your changes are, go nuts. All I ask is that you restrict them to "layout-only" check-ins. In other words, if you want to do some source code reformatting and change some code, please split it up into two check-ins, one that does the reformatting and the other that changes the code.

Otherwise, I end up staring at a diff of 1500 changed lines of source code, 1498 of which are just reformatting, and 2 of which actually changed something. Finding those two lines is not fun.

And the next time somebody is asked to do some code archaeology, say to determine exactly when a particular change in behavior occurred, or to give an assessment of how risky it would be to port a change, the person who is invited to undertake the investigation may not be me. It may very well be you.

Comments (36)
  1. Leo Davidson says:

    I’ve been there.

    It’d be great to be able to diff the parsed code rather than the code itself. Of course, that assumes it can still be parsed (which can be difficult if external dependencies are gone/changed).

    Most diffing tools can ignore whitespace and the good ones can apply some regexps or similar in-memory conversions before diffing the two files but those still don’t help with big layout changes.

    This problem creates a knock-on problem with lined-up "tables" of data or function calls in code: If you add/change a line so that the indentation of the others no longer matches (e.g. a much longer name in one column) then you’re left with the choice of ugly code or noisy diffs, or doing two check-ins for one change.

  2. Mike says:

    A similar issue is that one checkin should be for exactly one bug.  It drives me nuts when people make a single checkin that fixes five different bugs.  Especially when one of the bugs is a security issue that must be backported to older versions of the system.  If it had been in a checkin all by itself it would have been trivial to port, but now is has to be filtered out of all this "extra stuff" and the probability of inserting a new bug goes up dramatically.

    The source control system doesn’t care if you make a bunch of itty bitty checkins.  It’s not going to get tired or upset at you.

  3. Alaska Capt says:

    Yet another example when a simple rule of thumb could change a lot of time. It really makes sense to divide the sheep from the goats :) Thanks for this insight.

  4. bdodson says:

    What this really indicates to me is that somebody should write a diff tool that knows enough about the language to detect changes in the code itself and exclude meaningless formatting changes.

  5. manicmarc says:

    Would a diff tool that had a flag to ignore changes in spaces, tabs and new lines work? What’s the downside?

  6. Duke of New York says:

    "Would a diff tool that had a flag to ignore changes in spaces, tabs and new lines work? What’s the downside?"

    That your file now effectively has one line.

  7. Roastbeef says:

    I’ll second the mention of "Beyond Compare 2". It is a great utility for merging source code trees and the great part is that it is crazy cheap… I think it’s like $15 for a single-user copy.

  8. David W says:

    Hey Raymond,

    ISO C has had “//” for comments since C99, which is almost 10 years ago now. :)

    [That’s why I said “classic C”. -Raymond]
  9. JS Bangs says:

    <i>The source control system doesn’t care if you make a bunch of itty bitty checkins.  It’s not going to get tired or upset at you.</i>

    Yes, but many teams have checkin gates that take a non-trivial amount of time to run. It can be prohibitively time-consuming for the developer to make 10 itty-bitty checkins when he could just make one checkin that rolls them all together.

  10. Jay Bazuzi says:

    Raymond is absolutely correct.  

    In fact, I usually go a bit further: any Refactoring that hits a lot of lines (like a rename) gets checked in by itself, too.

    The goal is to make sure of two things:

    1. Reading diffs works.  

    2. My change descriptions can be detailed without being long.

  11. Puckdropper says:

    Some diff tools have the option to ignore whitespace and whitespace changes.  No guarantee they’ll always work, but sometimes they’ll take 314 changes down to just 3 or 4.

  12. Bob says:

    It’s perfectly OK for the file to "have one line" when seen by the diff tool, if it can then point to the blocks of code that have changed by giving start line/column and end line/column pairs in the two files. A language-aware diff tool can even easily separate out fixing comment typos from changes to functionality.

    Honestly, meaningful whitespace in source code is like an accountant using Notepad instead of Excel – it’s completely insane.

    (Even if you love Python, the concept is "blocks" and the implementation for printing human-readable code doesn’t have to be the same as the data structure used in the editor and the compiler.)

  13. - says:

    This is a VCS limitation.

    A good VCS would automatically detect and separate reformatting, comment changes, or semantic changes.

    This would be doubly advantageous as you could completely hide all revisions that didn’t perform meaningful changes.

    Separating commits is a reasonable measure but it still breaks some tools (such as blame – seeing who changed one particular line of code and when: you usually won’t care about whitespace changes or indentation which can happen when Leo points out above).

    Unfortunately I don’t know of any VCM that does this.

  14. Joe Dietz says:

    I sometimes actually do precisely what you mentioned to make it hard ‘i.e. "too risky" to the PHBs’ to perform a backport on something that just shouldn’t be backported, because while the actual change might *look* simple, it is actually risky (typically this is driven by some number of customers/sales types demanding a backport rather than waiting for the next release).  Sustaining often is not QA’d to the same level as a release and almost never has a beta release cycle.

    As an aside: beyond-compare as a diff tool is pretty good at visually separating what is a formatting change from what is a substantive change.

  15. Duke of New York says:

    [It’s perfectly OK for the file to "have one line" when seen by the diff tool, if it can then point to the blocks of code that have changed by giving start line/column and end line/column pairs in the two files.]

    It’s not OK in practice, because as you increase the granularity that the diff tool looks at, you create more opportunities for it to guess wrong. You’ve probably seen a diff that starts out fine, and then at some point it matches the wrong pair of lines and produces garbage for the rest of the file.

    [I usually just ignored the policies and did things my own way.]

    I would hope that if anyone does this on the Windows team, the management will clean his clock.

  16. Jake Montgomery says:

    I agree – don’t do it.  But if you are the poor person stuck trying to diff code that has been reformatted, here is a tip.

    Get both the old and new files from source code control, then run a good "beautifier"/auto-formatter on both of them. This way they are will both be in the same "format" and a diff should be more productive. It is not ideal, but it is a workaround.

  17. steve says:

    For active code with high longevity, one might consider forbidding mass syntactic reformatting altogether.  Reason being, diff’s between non-sequential versions will neccessarily include any mass syntactic reformatting in intermediate revisions (whether or not the mass reformatting was originally commited in isolation), and diffs between non-sequential revisions become increasingly useful and neccessary as the code and the team maintaining it change over long periods of time.  

  18. lorg says:

    I’m doing a seminar for my degree about automatically finding bugs in concurrent programs. One of the references is this paper:


    with the title of "Yesterday, my program worked. Today, it does not. Why?". It details an algorithm to automatically find the bug causing culprits in a big check-ins, or multiple check-ins.

    It seems this algorithm actually works, and while it might be overkill for the style changing issue, it might be useful for more complicated cases.

    At the least, it’s a very interesting (and educating) read.

  19. Kaenneth says:

    How about diffing at a different level, instead of using re-formatted source, couldn’t there be a mode of a compiler that just parses/tokenizes 2 versions of a project, and reports the diff at that level, but in terms of origional source lines?

    Such that

    int n;{int x=1; int y=1; n = x+y;}

    would be exactly that same as

    int n;{int a=0x01; int b = 0x01; n = (a)+(b);}

    since both would produce the same instructions (I think):

    Assign the value ‘1’ to each of two integer variables, then add them giving (COBOL flashback!) the result in a third. The names and number formats, and extra parens are meaningless… (I wouldn’t be surprised if a modern compiler just emitted n = 2;

  20. Yes, but... says:

    At a company where I used to work, an employee decided with no authorization to change all the formatting (thousands of files).

    The company’s (inferior) peer review system complelety broken down, unable to verify that all the changes were only formatting.

    The employee was officially warned about making the un-authorized changes. And all changes were rolled backed.

  21. Matt Craighead says:

    I’m with Mike: the less a single change does, the better.

    "many teams have checkin gates that take a non-trivial amount of time to run": This is often either a management problem (overly draconian policies that leave no room for individual judgment; if you don’t trust someone’s judgment, why did you hire them?) or a tools problem (better tools, such as [insert shameless plug about my company’s product], can make this sort of stuff far less painful).

    In the past, when faced with overly draconian checkin policies, I usually just ignored the policies and did things my own way.  It never seemed to matter.  As long as I was careful and ran the tests that actually *did* matter, I didn’t break things.

  22. Worf says:

    One issue with this is that most revision control systems won’t allow a user to check a file out multiple times.

    E.g., say you’re working on your feature, and end up editing foo.c, bar.c, and baz.h. Now someone else comes along and says for you to fix a security flaw, which exists in bar.c.

    Ideally, you’d be able to check the file out again, make the fix, then check it in (and have your other checkout auto-merge.

    You want to save your changes so you can fix this priority bug, but a checkin will break because your feature hasn’t been tested at all – it may not even compile. You also have to revert your other files since they can interfere with the fix (and make a noisy checkin).

    You could move the files elsewhere and grab fresh copies out (which may involve reverting), then fix, checkin, then checkout the old files again, merge those other changes in so you don’t accidentally un-fix it…

    Revision systems need the ability to suspend in-progress changelists (save current changes and revert on-disk files) so other higher-priority fixes can be done, and automerge those changes when you resume the previous task, while restoring all the in-progress changes…

    Otherwise the little checkins can get tedious to manage if you keep having to backup, revert, fix, checkin, checkout, restore, resume. And if you forgot a file and it got reverted…

    [Oh, you mean shelving? -Raymond]
  23. Duke of New York says:

    Yeah, Microsoft devs need to take a break from telling everyone else how to use souce control and get to work on some kind of "change packing" program.


  24. Mike says:

    I totally agree with you, but for those cases where people have mixed both types of changes "diff -w" is your friend — no more random whitespace deltas.


  25. Bob Posert says:

    This brings back memories!  I once worked at a place where two unnamed developers didn’t agree with each other’s formatting, and they would work on the same module, so pretty much *every checkin* would have huge amounts of reformatting.

    When I had to see what the real changes were between two different versions, I copied them both locally, reformatted both using some code-reformatting tool (forget which now), and then did a diff on the results.


    get -version 1.6 foo.c

    prettyprint-c <foo.c > foo.1.6.c

    get -version 1.7 foo.c

    prettyprint-c < foo.c > foo.1.7.c

    diff foo.1.6.c foo.1.7.c

    That was fun.

  26. Mark says:

    @Kaenneth (and others): I think the danger of that approach is that the logical extension is to diff the binary output of the compiler rather than the source code.

    Where do you stop? Lexical analysis of the source? Fully generated parse-tree? IL? Optimisation?

    For my money I think I’d prefer false positives than false negatives in diff output. In the former case the worst that happens is I have to spend a couple of extra seconds convincing myself that a changed line is just superficial, in the latter a change goes completely unnoticed.

    I am with Raymond on the original point though; one check-in should equate to an atomic change – fixes exactly one thing in exactly one way.

  27. Voice of Reason says:

    So you use indent on both before doing the diff?

  28. Matt says:

    The worst is if you’re programming in Java on different operating systems.  Some IDE’s are better than others about preserving the original CR vs CRLF vs LF.

  29. Worf says:

    Yeah, too bad out source control system doesn’t support shelving.

    Really annoying.

  30. Duke of New York says:

    Oh well, I guess nothing can be done about that.

  31. Infernoz says:

    Haven’t you heard of WinMerge, for comparisons, it is free, and allows you to add regex expressions for to ignore stuff like white space and comments.  I use it loads.

    Another method is to enforce a standard code formatting, on check-in, or before conparison.

  32. Why not have a universal formatter for the entire team. I know we used to face this problem in our java application and set up the same formatter for all the developers.

    Since everyone will be using the same formatter you won’t have to be going through this painful process ;)

    [And then someone changes the universal formatter. That’s the scenario I’m talking about here: The team’s code formatting rules changed. -Raymond]
  33. Igor Levicki says:

    What you guys need is this:

    1. Store the code in some universal format defined by corporate standard in the VCS

    2. On check out, apply developer’s own coding style to each checked out file

    3. On check in, strip out developer’s own coding style from each checked in file

    That way diff wouldn’t be a problem anymore.

    Btw, I have patented the idea just in case that some of you thought of actually using it ;-)

    [And then when the corporate standard changes? That’s the scenario here. (Apparently nobody reads past the first paragraph.) -Raymond]
  34. Igor Levicki says:

    >>And then when the corporate standard changes?<<

    You just freeze the project (check out gets disabled, everyone finishes up and checks in) and reformat the code in the VCS to match new corporate standard.

    After that everyone can continue to use their own coding style again.

    For example I use the following style:

    void test(void)


    int a = 1;

    if (a == 3) {

    // do something

    } else {

    // do not



    You can store that piece of code in the VCS any way you like it — even on a single line if that is the easiest way to diff it.

    You just need to derive a transform from my style and apply it when I check out or strip it out when I check in.

    [“check out gets disabled, everyone finishes up and checks in” – easier said than done. “Everyone finishes up” can take months if they’re working on a big feature. Do you lock the tree for a month? -Raymond]

Comments are closed.