Source code archaeology

Some ideas on how to manage legacy software systems and source code.

Archaeological excavation in Härnösand, Ångermanland, 1908

According to Wikipedia1:

"Archaeology, or archeology[1] (from Greek ἀρχαιολογία, archaiologia – ἀρχαῖος, arkhaios, "ancient"; and -λογία, -logia, "-logy[2]"), is the study of human activity in the past, primarily through the recovery and analysis of the material culture and environmental data that they have left behind, which includes artifacts, architecture, biofacts and cultural landscapes (the archaeological record)."

You can also add legacy software systems to that list of ancient artifacts. Like real antiquities, legacy software is often poorly understood by its inheritors, due to a lack of supporting documentation and access to the original designers.

Here are some techniques that I have found useful when dealing with legacy code:

"Refactor mercilessly. Do not be afraid of old code."
  • Break down the implementation into separate, small units that are easy to understand. Document these in isolation (a wiki is great for this).
  • Only when you have a good understanding of each small part, should you then step back and look at how these parts interact with each other. That will improve your understanding of the big picture. Continue to document your findings.
  • When your review is complete, share it with colleagues. There will be gaps in your understanding that others can help to fill.
  • Remember not to fully trust the code: it may have implementation bugs, meaning the actual implementation may not match the original intent of the software, and it is that original intent that you are trying to arrive at.
  • Comment the code where comments are non-existent and would be helpful for the next reviewer. This is good programming karma.
  • Refactor mercilessly. Do not be afraid of old code.
  • If unit tests are non-existent, add them. Once an existing legacy implementation has good code coverage, you can refactor with greater confidence knowing that your software behaviour is covered by tests that ensure that behaviour remains unchanged.


John Collins

I have been writing about web technology and software development since 2001. I am the developer of the Alpha Framework for PHP. I love open source, technology, and economics. You can follow updates from this blog on Twitter.