Towards identifying and minimizing customer-facing documentation debt
Lakmal Silva, Michael Unterkalmsteiner, Krzysztof Wnuk
TL;DR
The paper addresses the problem of customer-facing documentation debt lagging behind software evolution, causing maintenance costs and delays. It analyzes bug reports from System A using the Aghajani taxonomy to identify prevalent documentation defects, finding that information content gaps (especially erroneous code examples, missing documentation, and outdated content) are dominant. Based on these findings, it proposes two automation-driven solutions—Dynamic Documentation Generation (DDG) and Automated Documentation Testing (ADT)—grounded in a single robust information source and implemented conceptually through Darwin Information Typing Architecture (DITA). The study provides empirical insights into defect distribution, costs, and feasibility of automation, and outlines implementation paths for industry to reduce documentation debt and improve co-evolution of docs and software. The work emphasizes practical impact by offering concrete design criteria, potential cost savings (notably addressing 59% of defects that automation could detect), and a clear path to industrial validation. Data supporting the findings are publicly available, enabling replication and extension.
Abstract
Software documentation often struggles to catch up with the pace of software evolution. The lack of correct, complete, and up-to-date documentation results in an increasing number of documentation defects which could introduce delays in integrating software systems. In our previous study on a bug analysis tool called MultiDimEr, we provided evidence that documentation-related defects contribute to many bug reports. First, we want to identify documentation defect types contributing to documentation defects, thereby identifying documentation debt. Secondly, we aim to find pragmatic solutions to minimize most common documentation defects to pay off the documentation debt in the long run. We investigated documentation defects related to an industrial software system. First, we looked at different documentation types and associated bug reports. We categorized the defects according to an existing documentation defect taxonomy. Based on a sample of 101 defects, we found that most defects are caused by documentation defects falling into the Information Content (What) category (86). Within this category, the documentation defect types Erroneous code examples (23), Missing documentation (35), and Outdated content (19) contributed to most of the documentation defects. We propose to adapt two solutions to mitigate these types of documentation defects. In practice, documentation debt can easily go undetected since a large share of resources and focus is dedicated to delivering high-quality software. We suggest adapting two main solutions to tackle documentation debt by implementing (i) Dynamic Documentation Generation (DDG) and/or (ii) Automated Documentation Testing (ADT), which are both based on defining a single and robust information source for documentation.
