Chicken and the Egg… (aka. Read vs. Writing code)

So I have been asked to deliver training for a group of people on how to “read” source code. I guess I should frame this request a bit. With very large product such as Exchange there are millions of lines of code and no one knows everything about what is happening in source. Most developers are very focused on the pieces of the puzzle they own. As part of the Escalation Team for Exchange we have to “reverse engineer on the fly” so-to-speak to understand and develop steps towards resolving customer issues. This typically involves jumping quickly from one code base to another depending upon where the investigation takes us... A large portion of our time is spent simply reading source code, not writing it.

So how do you teach people this “art” of digging deep very quickly into unfamilar code that you had no hand in writing? I myself, I come from a very traditional process of learning how to code.. by sitting down and writing it. I am struggling with how to tailor a delivery to focus on reading vs. writing source code. To me the only way you can be truely efficient in this process is by having written code yourself.



==== Update ====

Great comments...
So boy do I agree about good comments, but to me comments are really geared towards explaining a particular block of code at the implementation detail level. But how does one know where to begin looking for that particular block of code? I think this stems from great engineering documentation about the object model itself and how things related from a high level. 

I guess to ellaborate more on my intention and its really to help individuals without a lot of understanding of C/C++/C# begin to understand how things fit together and how they can begin using source code to determine what to look for that is wrong in the customer enviornment. To me if you are attempting to read source with sparse comments, then you need to have some practicle understanding of the language itself.


Comments (18)

  1. I believe that when it comes to reading and understanding a large body of code, the reader must lean heavily on object browsers and documentation. Not only that but readers must be competant searchers. They gotta have good searching tools for their source code and regular expression experience might be helpful. It also helps to have a version of the code which can be commented.

  2. Chen says:

    Only read the comments! That should be enough for a "good source code" :-).

    But, unfortunely, most programmers are too lazy to write good comments…

  3. Tom Hoats says:

    A process that has worked well for me is to understand what level of abstraction the current problem is dealing with (architectural, algorithmic or implementation detail), imagine what steps the developer had to take to implement his approach, then start looking for alignment between theory and practice. A conceptual roadmap is key since you may need to understand what is MISSING, not what is there. Having SDK and other docs open for immediate reference is invaluable because developers understand what they are doing but they may not understand what other routines do at their request.

    I guess this is the same process used in any objected oriented design — identify key data and what operations must be done on the data, then hide everything except what is needed to support external interactions. It works in reverse, too.

  4. It’s magic as best as I’ve been able to figure out.

    I can do it, but others on my team can’t.

    My one recommendation is: Be fearless when debugging. If a problems steps out of your area of code, chase after it – read the code at the destination and try to understand it. In general, most code follows some form of standards, so it’s usually not difficult to figure out what’s going on.

  5. Scott says:

    I guess it depends on what you mean by "without a lot of understanding of C/C++/C#". Myself, I set a break point at every method I’m interested in and start debugging. I look at the call stack a lot to see who’s calling the method I’m interested in and that helps me figure out the program flow. I also reverse engineer at least a partial UML diagram of the classes I’m interested in. Between those two I can usually figure out what’s going on. Even with bad variable and method names.

  6. I agree that being able to write code yourself is vitally important to reading code.

    Whenever I have to read code that I did not write, I find myself looking for the iterative structures and control structures (If…Then, For Each…Next, Do…While, etc) to break the module into more easily-digestable chunks.

    But I write code for a living. It sounds as though you are trying to teach non-coders how to read code. And that may not be realistic. I wish you luck, though.

  7. TristanK says:

    There are degrees of "readingness", and degrees of problems that can be solved by different levels of source analysis.

    I, for example, probably wouldn’t be able to spot a leak in a given application (of the complexity of Exchange) just from reading the source (my stack isn’t big enough), but I could probably tell you the intent of a particular block of code and follow the branches and the logic.

    So, if the troubleshooting process can get the individual to the point where they know roughly what code to be looking at, having a rough idea of how it works is better than no idea at all.

    Still, at this point having a "rough idea" basically means "understanding the language syntax", which tends to come from "writing stuff that doesn’t work" 🙂

  8. Kirk Munro says:

    I feel I am quite good at this, probably from switching jobs (and therefore code bases) a fair amount over the years.

    Something I find invaluable when understanding someone elses code is Source Control System check-in comments. I can often learn a lot about a piece of code by looking at what changed between revisions and why. If the comments aren’t helpful, I can at least find out who made a specific change and question them about it.

    As far as source code comments, I’m of the camp that believes well written source code does not need (many) comments. The problem with comments is (1) that they are not applied in a consistent manner across modules written by different developers and (2) that they are not maintained or often reflect an individuals thoughts about a piece of code without a timestamp or username to go with those thoughts. Comments are best applied when used only when necessary.

    When using clear, unabbreviated variable names and small blocks of well structured code, comments usually aren’t necessary. I only provide comment blocks when doing something complex enough or unintuitive enough to warrant the comment. If you need heavily commented code, your code probably has a lot of room for improvement.

  9. Kroman139 says:

    Ok, sometimes I’ve to work with a lot of code. Not so huge as Exchange sources, but it’s about 200-300 Mb (only C code).

    1) SourceInsight or something like this.

    You may load 300 Mb sources into it, and it’ll build the browse information for you just via parsing sources (no compilation at all).

    Press F7 and you’ll see all the symbols from the sources (hundrends and thousands). Start typing your symbol name "FunctionBlah", "StructureSSS" and you’ll se the sources were it’s defined or declared.

    Move cursor to any variable and you’ll see the definition of it.

    Really helps.

    2) Look at the component I’ve to dig up.

    3) Try to take a general look at the component

    Their internal structure (general), the most important external components, etc.

    4) Take any component responsibility.

    I don’t know how to name it in English.

    Component performs some activities (manage files, send/receive emails, implement protocol, etc.).

    Start analyzing from any function (component function, not C/C++).

    5) Start to read the code.

    "Enter function XXX, doing this, doing that, agha! calling that function, exit here, exit there"

    Do not try to dig everything. You are trying to understend this code only (in general).

    Skip all simple functions, or long function with a simple logic (1000 code lines of reading and parsing text files). Yes, this function is important (it’s length is 1000 code lines), but it takes <<file-name>> and returns filled structure. No problem, I’ll analyze it later.

    You are trying to understand the common structure of the component. The internal component architecture is a little stable. Different components may have absolutely different structure, but usually the structure of the given component is solid (only one-two-three different approaches to do the same thing in the component).

    6) Thinking, reading, thinking, reading

    7) Start to play with trace messages.

    Debugger’s are good in several situations only. Tracing is more important – you may get a lot of interesting information just in a few minutes.

    This way is to understand the common logic of common operations (how does it connect/login to server? how does it return errors?)

    8) Don’t trust any comments.

    The comments was originally written by person A along with the original code, but the persons B, C, D, E, F, … with slightly different skills were supporting that code. Sometimes, the comments may became out of date.

    9) Don’t trust any assumptions and preparations.

    Your component may prepare some piece of data for another component (e.g., exctract several values from small array and put these values into another one), but that another component may support that preparation by itself (it knows that the array may contain unnecessary values).

    10) Don’t use complex tools to analyze code or to store analysis info.

    Automative reverse engineering tools may generate a lot of diagrams in few hours, but that diagrams will show you nothing usefull.

    Or you may try to write down every aspect of some function into Word, Excel, OneNote. It useless, endeed. Because you will spend a lot of time just to copy the function into another storage (from plain text file into well-formatted Word document).

    11) Use OneNote (or the other tool) to store critical information.

    Important constants are the best thing to have easy access. Especially, if that contstants are not from your component, and there are not so many values.

    12) Use Microsoft Excel

    Excel may translate hex values into decimal, decimal into chars. You may create simple row in Excel with formulas and copy it down throw several rows. Just copy and past consts from the sources and look at the result.

    You may write down simple functions and use them.

    13) Use paper and pencil

    A3 is good. Stickers are good. Pencils are very good.

    14) Never give up.

    That dummy piece of code has been written by people. Some of they were very clever, the other were not so. But they have wrote it. And it works.

    So, you may understand it. Yes, it requires a lot of time and nerve, but you can do it.

  10. ct says:

    > I think this stems from great engineering documentation about

    > the object model itself and how things related from a high

    > level.

    This is key. This and debugging for software engineers. When I was working on a piece of code that traversed public folders on an Exchange server, I wrapped the Exchange SDK Win32 calls in classes that helped me, as a developer, figure out what was going on (the Folder class dealt with folders, etc…wasn’t rocket science, if there’s a problem with accessing a folder, you know where to look).

    Supporting a product is a little different. At a former employer, the level 3 ops guys knew the product better than I did when I started working on it. The architecture of the product helped: there was the core app server, then everything else was a "feature." Each feature was encapsulated in its own directory, had a single config file, etc. The logs were extremely verbose. Part of each log line was the feature name. When there was an issue, most of the time they could trace it down to the feature where it was happening. If they needed to escalate it to software engineering, we could hit the ground running.

    Instead of starting from "this is happening on the customer’s client," they could tell me "the server is doing X in feature Y, and we think that may be an issue."

  11. Tim Almond says:

    One of the problems I’ve encountered is people writing documentation from the code. So, they click something which generates the UML or something, or just take theie code and convert it into a written spec of the code.

    The best thing is a really good overview that then breaks down the design approach to the code. Whenever I’ve had in effect "programmer’s notes", it’s helped me understand why a coder did something in a particular way which helped with my understanding.

    Discreet commenting of code rarely achieves much, except when a particularly tricky block needs some explanation. The harder part is people understanding how all the code works together.

  12. Escalation Engineer JeremyK asks in his blog this morning : how do you teach people this &#8220;art&#8221;

Skip to main content