How to Read Other People's Code -- And Why

Mongoose · on Oct 23, 2009

Read the comments, but don’t believe them

Love that one. A grad student friend I work with, whenever he catches me poring over documentation, always tells me to read the code. Such a seemingly simple tip, but so valuable.

cgs · on Oct 23, 2009

As my boss likes to say, the extension for Ruby documentation is ".rb".

eru · on Oct 23, 2009

The extension for Haskell-documention is `.lhs'. But you can compile it to a `.ps' or `.pdf' with LaTeX if you want to.

pgbovine · on Oct 23, 2009

one technique i often use when trying to understand someone else's code is to add in comments myself in my private branch (of the form "I think that X works like Y and Z"), or even better, adding in run-time asserts that I think ought to hold, and then running tests to make sure they do hold.

Of course, i never check in my comments/asserts to the main branch, since i'm not sure whether my understanding is correct

pmjordan · on Oct 23, 2009

Of course, i never check in my comments/asserts to the main branch, since i'm not sure whether my understanding is correct

I also add my own comments when understanding/using/changing other people's code. I do eventually check them in though. After all, why not?

* If the original author is no longer available, you're now the best authority on the subject; chances are your comments will help anyone in your shoes sometime later, including yourself. If you're really unsure, add that information to the comments.

* If you're making changes based on your understanding of the code, you or someone else will be testing those changes, thereby validating or falsifying those assumptions. If, after testing, the bugfix/feature/task works, so do the comments.

* If the original author will be available at a future date, you can get them to do a "comment review" (cf code review) when they get back.

ivenkys · on Oct 23, 2009

I do exactly the same. Added to this, in large legacy code bases with no or very little tests, i sometimes try and break the functionality near to the part where a new fix is required in my private branch to check my understanding. I find that its much easier to understand the code when something doesn't work rather than when its working perfectly. With any luck, the stacktrace will tell me what's going on.

JoeAltmaier · on Oct 23, 2009

Code management tools need a way to add notations without getting in the way of code. Something that can be view thru a link, but is out of the way when poring over the code. ASCII text: anachronism. We're not that far from 80-column IBM punched cards...

_csoo · on Oct 23, 2009

You want something like literate programming then

diN0bot · on Oct 23, 2009

if the original code writers had used comments and asserts to not only assert their assumptions but then maintain those assumptions over time, then you wouldn't have to. plus, they'll probably introduce less bugs and debug faster in the future.

i can only speak from my own experience, thought processes and developing skillset. everyone's different. some people can keep track of numerous complex relationships in their head for months on end.

abstractwater · on Oct 23, 2009

once you determine your understanding is correct (there are a number of ways to make sure of it, like asking the author when he becomes available), I think you provide a service if you abridge your comments and check them in. Sometimes it's like fixing a "incomprehensible code" bug.

stavrianos · on Oct 23, 2009

if it's unclear enough what the code does that you need to do this, maybe some speculative commenting in the main branch actually would be an improving.

sophacles · on Oct 23, 2009

On the how bit: there is no such thing as reading code once. It is a simple matter of reading a method/object/what-have-you, then rereading it until you see it called/referenced etc and don't ask yourself "wtf does foo do". Trace various code paths often enough and you'll 'get it'.

As for the why: I used to decide I didn't like module X, then decide I was going to implement it myself. After some time I would end up with something looking like X, but not nearly as refined. As such, I learned it's better to read others' code and understand it -- much faster than reaching the same conclusions as the original authors w/ crappier code. Of course if after understanding you decide you still think it's crap -- have at it :).

Aschwin · on Oct 23, 2009

One of the reasons I don't write comments is because it gets outdated so quickly. And I don't need them myself, because I read only the code, even my own. Reading the code makes also for reviewing the code. Quickly changing bits so it makes better sense (like counts outside the for-loop instead of in the statement) etc. And use version-control system to make changes, test them and roll it back (revert). TortoiseSVN helps a lot with this for me. I also try to make the code more readable by adding my codingstyle or the codingstyle by convention (I still think mine is best ;-)

dasil003 · on Oct 23, 2009

Now that you know how comments can fail, you should use that knowledge to write good comments rather than not writing any. There are plenty of very good reasons to write comments.

The most common reason I write comments is to explain the purpose of something that is unintuitive from the pure code. Examples are comments in CSS about a particular browser quirk, or hacking around an edge case in an efficient but opaque way.

humbledrone · on Oct 23, 2009

The question I always ask before writing a comment is: will it contain the word "because?" If so, there is probably a need for a comment. If not, I will try hard to make the code more readable before breaking down and adding the comment.

HeyLaughingBoy · on Oct 23, 2009

One of the best reasons to write comments is that the comment describes what the author wants the code to do. What he wanted was not necessarily what he coded - bugs do creep in, after all.

If you confine yourself to reading code, but no other comments or documentation, you may think the code's behavior is correct, even though it's buggy.

dbshapco · on Oct 23, 2009

. . . and one of the plenty of good reasons being producing API documentation automatically via javadoc, doxygen, etc. Having the API documentation source and code live together is a big win. It's easier to maintain the inline documentation so it doesn't go stale. I hate being forced to go read code when I just want to use an API (I'm looking at you, Dojo :-/ ).

One of the first things I'll do encountering a feral code base is run an automatic documentation generator over it, even if there are no API comments, because many will produce at least some level of documentation from pure code, including cross references, a type index, call graphs, type diagrams, etc. This can be especially helpful when the code is poorly organized, and trying to trace simple program flow in an editor means navigating a dozen modules manually. Doxygen, for instance, will produce hyperlinked program listing, so that I can use a browser in a natural fashion to navigate the code structure and program flow. The browser can maintain virtually unlimited context, whereas my brain loses track of where I am once I'm seven levels deep in function call nesting.

Some IDEs and UML tools also are capable of reverse engineering documentation from the code base. The Togethersoft tools used to be excellent at grinding through code (and may still be, but I haven't used them in years).

RE'ed documentation of feral code can reveal how well (or more frequently poorly) the code base is structured, and identify key areas for architectural or design refactoring (if that luxury is possible).

In writing my own code, I decompose until each function or method has a single purpose (f() does X, not X & Y & Z!), and therefore the API documentation suffices to document the code itself. Rarely do I write a comment inside the body of a function or method. That happens when I re-visit the code, and discover that it's operation is non-obvious. The non-obvious stuff tends to be the tricky stuff it took some time to get right, and so it doesn't get mucked around with, and such internal comments rarely go stale.

I wait until re-visiting the code because authorial bias (my code effectively become's someone else's after several weeks, sometimes faster :-) ) obscures what is and is not obvious. I used to over-comment from a tendency to perform a mini-brain dump in comments -- but the knowledge required to WRITE the code (this is what I was thinking at the moment) is no reliable indicator of that required to READ the code (this is what _you_ need to know).

(I've theorized that having someone else comment the code from the start, just like having an unbiased tester, could make for better comments -- wherein the commenter is also necessarily a code reviewer as well. I've never gotten any of the places I've worked to agree to 'cross-commenting' as a standard practise, but most love worthless, perfunctory desk-checks prior to check in.)

To avoid comment churn, and because I refactor aggressively when creating brand new code, writing API comments is the LAST step in coding.

Finally, I developed a habit of writing comments exclusively in point form, because context switching from programming constructs to proper English grammar broke my flow. The point form comments feel like a miniature brain dump, whereas otherwise I'd pause to think about how to put the information into a proper sentence, and then make nice paragraphs, and suddenly I'd be channeling me from 7th grade compsition class. It's also easier to scan and digest comments as point form notes.

That's what I do, and I leave it at that, because telling someone else how to code is like telling them how to raise their children.

tl;dr version

- at least write API comments, pls

- doc generators (and other tools) sometimes are a great way to RE docs for feral code

- write comments in point form

- write API comments as the FINAL step in coding

- try to remove authorial bias from comment writing

dstorrs · on Oct 23, 2009

>In writing my own code, I decompose until each function or method has a single purpose (f() does X, not X & Y & Z!),

I hear this one a lot, from a lot of different sources. In practice, it never seems that practical. Here's a trivial example: the dashboard-page function in our webapp. It needs to do all of the following:

1) verify that the user is logged in. If not, bounce them to the login page, then bring them back on successful login

2) collect the list of all their providers

3) collect the list of all their profiles

4) collect the list of all their account information

5) load up and display the appropriate templates to render the web page with the requisite information.

Now, points 1-4 are accomplished by calling other functions, true. But dashboard-page still needs to do 5 separate things.

dbshapco · on Oct 23, 2009

In this specific case, the dashboard-page function is actually sequencing operations, not performing them (except for 5, and I'd wonder why that couldn't be moved to a separate function), and can be described as such:

  // - control the login sequence
  function login_sequence() {
    var user  = verify_login
    if (!user) {
       user = login();
    }
    var providers = collect_providers(user);
    var profiles = collect_profiles(user);
    var account_info = collect_account_info(user);
    load_templates(user);
    display_templates(providers, profiles, account_info);
  }

(Forgive the guess at what your code might look like.)

State may be passed between called functions, and used in control decisions, but state should not be grossly manipulated in sequencing functions (I do find with this style of programming that at high levels the state passed around tends to be large 'context' objects, rather than granular arguments encountered at lower levels). What I would NOT want to see in such a hypothetical login function is ALL the actual lower level code to do the login, collect the data, etc., so that essential higher order detail is obscured by the lower level operations.

A function's API comments do not need to repeat the purpose of called functions.

As I come up with API comments last, I usually think about them in reverse -- it's not 'I need to think of the single purpose of this function before writing it', but 'what single purpose did this function end up serving?'. Not being able to think of a decent answer for the latter is a possible symptom of sub-optimal decomposition. Then again, cutting blocks of code and pasting them into their own functions has become an instinct rather than conscious decision for me, so I'm effectively anticipating writing the 'single purpose' API comments.

At a certain level of detail I don't need to know the minutae of login, just that there is some black box function that controls the lower level details. And if I need to know the details, I break open the function and follow its call flow (or look at the autogenerated call graph in the doxygen docs or similar).

Having read a lot of feral code, I find the major indicator of quality is the static navigability of the code base (i.e. can I find my way around just by reading the code in an editor, without resorting to debuggers or autogenerated documentation), and having a level of detail structure, akin to the zoom feature on Google maps, is one method of achieving navigability (and partially the value of OO techniques). So it's okay to have functions/methods that simply sequence or aggregate calls to lower levels, and to describe them as such.

It was mentioned elsewhere on the thread that debuggers are useful tools in understanding a code base -- and I do often find myself setting breakpoints on code because it's near impossible to understand how particular functions get invoked by just reading the code. Then examining the call stack at the breakpoint I see that event loop called the network code invoked some code to read a database, which called into some code to instantiate widgets, which called back into the database code, which called the code that calculates order totals and tax, which called the widget code again to update those fields, all of which goes 30+ levels deep.

tetha · on Oct 23, 2009

What is the point form?

eru · on Oct 23, 2009

Perhaps in bullet points?

dbshapco · on Oct 23, 2009

An example from my code:

  /// - parse an HTTP input stream as XML
  ///   - requires a progressive parser (currently xpat, which is distributed with
  ///   Apache)
  ///   - the XML in turn is mapped to database operations, and the results
  ///   of the database operations used to create the HTTP response

bmj · on Oct 23, 2009

I'm having trouble wrapping my head around this--it seems to boil down to "I don't use comments because then I'd have to actually maintain them." I certainly don't think that every line of code should be commented, but as others have pointed out, some comments are important, particularly around why, say, an algorithm was implemented. Even for my own code, I need these little reminders, especially when I'm jumping between projects frequently.

I'm also at least slightly troubled by the assertion (not made Aschwin, but others) that the code is the only artifact worth reviewing. If you have a specification, or documentation, that doesn't line up with the product, why do you have those documents in the first place? Again, I don't expect that those documents will outline the specific implementation (though, if your shop uses software design documents, they should), but I should be able to get a reasonable explanation of how the system works from the docs. If this isn't the case, how do you expect to have someone test the system?

All that said, I agree that having the ability to read and understand others' code is critical to being a good programmer.

JoeAltmaier · on Oct 23, 2009

Well said. Comments come in classes, with different purposes. Somewhere (at the top of the module?) its helpful to mention the external dependencies - meta-information that will NOT be found anywhere in the code. For each code unit (method/function/template) explain why it exists, deficiencies, use case. Again, information not immediately obvious by reading code. Finally, those strange lines of code with magic numbers, obscure syntax, checks for apparent side-effects need some illumination. Lastly, and leastly, describing the code itself. These are the comments that get old fastest.

bmj · on Oct 23, 2009

I would add one more class of comments that would sit at the bottom of the hierarchy: TODOs. When I'm sketching out functions, I typically outline what should be done with within the context of the function, then fill in the implementation details later. This often leads to stuff like this:

  // Replace with a link to the core
  AddStartupShortcut("Shell.lnk", Path.Combine(INSTALL_PATH, "foo.exe"));

  // Update device time:
  _log.Information("Updating device time...");
  SetDeviceTime();

I always intend on cleaning these up, but often never do.

notauser · on Oct 23, 2009

If top-of-function comments like:

/* Does everything required to initialise the UI (DO NOT CALL DIRECTLY - see class foo) */

Get outdated quickly (without being updated) you have big problems.

Equally a one line comment before things like:

// Do this if we can be certain we are in month X

if (!(!x || ($chk(diff(y,o)) && (z<p))) {...}

Saves you having to draw out a truth table each time. Even if the logic gets tweaked a bit when bugs are found the purpose of the code is likely to remain and so the comment won't age too badly.

derefr · on Oct 23, 2009

In the first case, perhaps the function should only be accessible by/from Foo, or those that share a Foo-ish interface. In the second, I would really hope to be able to abstract that into a call to s.in_month?(m), with s perhaps being built from some combination of $chk, x, y, o, z, and p.

olliesaunders · on Oct 23, 2009

Your parenthesis don't even match in the second example, which could easily be re-written not to require a comment by renaming the variables.

gfodor · on Oct 23, 2009

Comments are a hack. There should be a way to include structured documentation with code that lies somewhere between free-form text (comments) and actual code. This documentation should sit at the abstraction level high enough to provide insight not captured in the code directly, but low enough that it actually has some coupling with the implementation so that if the implementation changes the documentation will be broken.

Of course, as long as we are using ASCII text for editing code, this is impossible. A few people are working on fixing this flaw (Intentional Software, Jetbrains MPS) but don't hold your breath!

bhousel · on Oct 23, 2009

Comments aren't for you. They're for whomever comes along later and needs to understand what you did. (Well it could be you, just several months in the future).

Sure, someone could read all the code, but it's way faster to just skim and pick out the comments if you don't care about the details. Think of code commenting as a time-saving device.

bendtheblock · on Oct 23, 2009

When I was studying CS, one of the things I was told was that if you grep and get only the comments out of a particular file/class, you should effectively see the pseudo-code of the program. In reality, I comment very little, just pointing things out that can’t be quickly deduced from casually reading the code.

olliesaunders · on Oct 23, 2009

I hope you don't still believe that's a good idea! That's a recipe for out-of-date documentation. Even if the comments are up-to-date they'll only be saying things that can be inferred from the code itself, less precisely.

tetha · on Oct 23, 2009

I have to state that I have seen a case where keeping comments around which formed pseudo-code was a good idea, even though it was just once, and it was a very nonstandard case.

Basically, the comments were text and relational algebra in LaTeX which explained the implementation of dataflow equations by using sets and those relational algebra expressions were then implemented in operations on binary decision diagrams. However, as I said: this does not occur often. In fact, one might call it rare ;)

bendtheblock · on Oct 23, 2009

I agree totally. That's why I don't do it.

jmostert2 · on Oct 23, 2009

If the code is really intricate, I refactor it to understand how it works (and how it ought to work, which can be an important way of spotting bugs).

But even though refactoring is supposed to be perfectly mechanical and "harmless" (especially when supported with unit tests) I'll usually throw away my refactoring, because the risk of breaking the existing code is just too high. It depends on how invasive the new feature is -- if I'm going to have to change a lot to get it done anyway, I might as well incorporate the refactorings. But if the change literally is finding the right position to add a single line of code, no way I'm going to change things once I find it.

When it's clear that adding even simple features takes an extraordinary amount of time because the code is that hard to understand and maintain, getting/making time for proper refactoring is easier, but it's worthwhile even as a way of creating a mental model.

I've never tried unit tests to figure out the semantics of existing code, though, even if it seems obvious in retrospect. I think the code base would have to become very complicated indeed before I go into full scientist mode and construct hypotheses on the semantics and verify them with unit tests. If the code is a tangled mess, it could also be tough to extract the proper bits to test on and/or set up a test environment complete enough for that.

wenbert · on Oct 23, 2009

Maybe it's just me. But sometimes I draw diagrams describing how the code will work and how the methods interact with each other. A quick drawing with a pencil and paper or on a whiteboard will do. It helps me understand how the over-all system works...

JoeAltmaier · on Oct 23, 2009

Agreed. If it won't all fit in my head at first, put some of it on paper. Especially when learning the code. And re-visit the drawing, testing it against what I'm reading as I go until I'm sure its right. Once its in my head, throw it away. Wish there was some way to use drawings as comments... My friend Bob learns differently - he needs words, skips all the illustrations in manuals, they just don't sink in for him. He's brilliant, just built differently.

wenbert · on Oct 24, 2009

That'd be the day :D Drawings for comments is absolutely a great idea. Sometimes I wish I could "draw" something in my code (Arrows, bookmarks, quick diagrams, etc. beside comments). Think github with this kind of feature.

known · on Oct 23, 2009

One productive technique to read other people's code is to step into the code using a debugger (e.g. gdb)

tetha · on Oct 23, 2009

I disagree with this. The debugger is a very, very precise tool and it can be used to gain very, very precise insights into certain code, however, very often the debugger is just too precise. It is pretty much like trying to understand a large chip by looking at how gates flip and flop.

Of course, if the code is horrible enough, then you might need to switch down to actually tracing line by line and opcode by opcode, potentially even using a debugger, but for most sane code, I think it is possible and faster to understand larger blobs of code at once.

bendotc · on Oct 23, 2009

Agreed.

In general, if you're having trouble tracing through a particular function with a narrow band of input, then the debugger can be useful. If you're trying to figure out a larger system and/or how a system works across a large set of inputs, then stepping through with a debugger is useless.

Put another way, the debugger is to programming what the microscope is to medicine: incredibly useful for some things, but not a very good general diagnostic tool. Metrics data and checking assumptions (via unittest and/or asserting expectations based on reading the code) are much better for getting an over-all idea of what's going on. Once you've localized a problem, then the debugger can help you get a precise idea of the issue, or can help you test a hypothesis (to use the medical example, you can use the microscope to test a theory that there's a bacterial infection).

known · on Oct 24, 2009

YMMV. But that's how I understood Apache's Httpd code http://httpd.apache.org/dev/debugging.html

AGorilla · on Oct 23, 2009

Not to mention the side effects of breaking code you don't understand at random points.

kls · on Oct 23, 2009

I agree, the debugger is so important to understanding foreign code, yet so many people only use it when they are looking for a bug or not at all.

wakeless · on Oct 23, 2009

I don't mind using code folding when I'm doing this. Trying to fold the code down that I don't care about, this way there isn't as much to read and get lost in.

lsb · on Oct 23, 2009

If you're not folding functions, but parts of functions, that's a sign that it's poorly divided up, and a refactor would benefit it.

zeugma · on Oct 23, 2009

Indeed, but you should understand it before even trying to rewrite it, then folding can help a lot on poorly divided code. `

notthinking · on Oct 23, 2009

It might be my relative newness to software, but I enjoy looking at other peoples code to see what works (pick up new language features) and what doesnt.

axod · on Oct 23, 2009

I'd also heartily recommend learning to read other peoples code from the assembly - softice/dissassembly listing.

jcl · on Oct 23, 2009

Could you elaborate? It doesn't sound like it would improve your understanding of the code.

olliesaunders · on Oct 23, 2009

Michael Feathers' Working Effectively with Legacy Code is the authority on this, I think.

silentbicycle · on Oct 23, 2009

Not quite - it's primarily focused on safely retrofitting unit testing onto legacy codebases, since restructuring code to make it testable can easily break it.

His definition of legacy code is code that doesn't have tests, so you can't fix it or replace units of it without unknowingly introducing bugs.

simplegeek · on Oct 23, 2009

Also, using a debugger (if it can be used) can be really helpful

berntb · on Oct 23, 2009

I quite liked this, regarding this subject:

http://perlmonks.org/?node_id=788328

alexkearns · on Oct 23, 2009

I don't know about the rest of you but I got into software development to create software (ideally from scratch) and to learn clever new stuff, not to tinker with someone else's code. Of course, just like every other software developer, I have on occasions ended up working in someone else's codebase. But I have done this not out of choice - and except in a handful of cases, I have not gained any knowledge or mental satisfaction from doing so. I have done it because I have been told to by my boss.

Let's not con ourselves. Becoming a dab hand at maintaining other people's software does not make you a good software developer (writing your own software does that). It makes you a good employee who is willing to do the boring shit. The two are not the same. I for one hate maintaining other people's code, and if I ended up doing that the majority of the time, I would either get a new job, switch careers or kill myself. Probably one of the first two.

axod · on Oct 23, 2009

I think you're missing the point. The idea isn't to have to maintain someone elses rubbish code.

The idea is that reading good code makes you a better coder. Which it certainly does. Having said that, bad code can sometimes teach you lessons in how not to do things.

Good authors probably read other books as well.

jcl · on Oct 23, 2009

At some point in your career, you come to realize that unreadable code is bad code -- and that if you aren't writing readable code, you are a bad coder. Usually this realization comes when you are asked to fix something in code so horrible that it causes you to wonder aloud, "What idiot wrote this?" And you look in the repository logs and see that it was you, several months ago.

Even if you never intend to maintain code, if you don't want to be that guy whose code everyone hates to work on -- whether out of compassion for your maintainers or out of career self-interest -- you need to know how readable your code is relative to others'. And the only way to know this is to read other people's code.