Documenting Legacy Systems - Where to start?


Dear fellow documentarians,

I am a newly hired programmer and working in an internal position of my company. I am tasked to help maintain the legacy codebase (aka. programming extensions, fixing bugs etc.) and my manager likes to have a plan, how we can outsource some work to external programmers, in case we get more work than our small team can deliver.

Now there is this huge problem: As you might have guessed, there is very little documentation. As we work with perl, we have some documentation as perldoc, but its mostly a general overview of what a program does. There are literally no design documents ( like Class-Diagramms, Use Cases, Requirements, Dependency-Diagrams ). I find it incredibly hard to independently work myself into one of the programs, understanding their purpose and how they work. Also testing functionality without knowing the requirements is a pain (Aaaaaand we have near to zero Test-Coverage. Just don’t ask) . I think there is no outsourcing work, if this is such an issue.

Now my Question for you:
As we have next to nothing, I want to start a documentation of our environment. I am talking about ~50 independend programs. Do you have a recommendation with which aspects you would start with? I want the focus of this documentation to be: Help new developers understand the system so they can quickly get started working on our code.

I thought of starting a sphinx-system for every program and documenting the purpose of the programs first. Then I want to add a list of Requirements I can gather and Use Cases I can find. Is this a reasonable way?


I started writing this answer, and it escalated so I just published it as a blog post first :slight_smile:

In general, I suggest two things in parallel for this kind of system documentation: a brief system overview and one kind of detailed documentation.

System overview

By system overview I mean something short that introduces the different perspectives on the system. Specifically, I mean max one ‘page’ each on the following as you can figure out.

  • Component list - a table of the system’s separate components that identifies each one’s name, purpose, provider and version number
  • External interfaces - similar to the component list, but for other people’s systems you connect to
  • Architecture overview - typically a single diagram - that shows how these components and interfaces connect
  • Functional overview - very short description of what the system does and why - which business problem it solves.
  • Non-functional requirements - which five, say, of the long list on are most important for this system

The trick with each of these is to provide a starting point for a developer to delve deeper by asking other people or reading the source code.
You typically won’t need a lot of detail in every area.
However, you will probably need at least some more detailed documentation.

Detailed documentation

However, each system usually has one area that features enough complexity to require more detailed documentation.
Some systems have complex data, or data models that use obscure domain jargon.
Other systems appear simple but support a complex business process.
In some systems, the complexity lies in complex user-interaction and obscure user interface details.

By the time you have compiled a few pages of system overview, you will probably have figured out which kind of complexity characterises the system.
Next, add more detailed documentation that explains the system’s most complex aspect.

  • Use cases - for systems whose functionality and behaviour hides complexity
  • Data dictionary - explain a complex data model by explaining what each name (tables and attributes in a relational model) means - with bonus points for data type information and data examples
  • Data model diagram - rarely as useful as a data dictionary, but useful when the nature of the relationships needs explanation more than what things mean
  • User interface map - to unwind and explain all elements in the user interface
  • External interface specifications - for when there’s a published/public API

Your mileage may vary, but my experience of internal business systems suggests that a data dictionary provides the most value on account, because what things mean is the only thing that you can’t figure out by studying the source code.

Next steps

Once you have basic system documentation, impossible things such as fixing bugs should become possible.
The next steps are to help people do things more quickly.
I recommend focusing on what everyone has in common, whatever they do later: getting started.

  • Development environment set-up guide
  • Test/production environment configuration and set-up
  • Release/deployment guide
  • Operational/usage guide
  • Getting started user guide

In all of the above, focus on identifying the audience for each piece of documentation, and identifying and understanding your current biggest problem.


Thanks a ton - these are some really great starting points! (And your blog-post immediately went into my bookmarks :wink: ). It’s really hard not to get overwhelmed with this kind of task.