Table of Contents
Fetching ...

Towards identifying and minimizing customer-facing documentation debt

Lakmal Silva, Michael Unterkalmsteiner, Krzysztof Wnuk

TL;DR

The paper addresses the problem of customer-facing documentation debt lagging behind software evolution, causing maintenance costs and delays. It analyzes bug reports from System A using the Aghajani taxonomy to identify prevalent documentation defects, finding that information content gaps (especially erroneous code examples, missing documentation, and outdated content) are dominant. Based on these findings, it proposes two automation-driven solutions—Dynamic Documentation Generation (DDG) and Automated Documentation Testing (ADT)—grounded in a single robust information source and implemented conceptually through Darwin Information Typing Architecture (DITA). The study provides empirical insights into defect distribution, costs, and feasibility of automation, and outlines implementation paths for industry to reduce documentation debt and improve co-evolution of docs and software. The work emphasizes practical impact by offering concrete design criteria, potential cost savings (notably addressing 59% of defects that automation could detect), and a clear path to industrial validation. Data supporting the findings are publicly available, enabling replication and extension.

Abstract

Software documentation often struggles to catch up with the pace of software evolution. The lack of correct, complete, and up-to-date documentation results in an increasing number of documentation defects which could introduce delays in integrating software systems. In our previous study on a bug analysis tool called MultiDimEr, we provided evidence that documentation-related defects contribute to many bug reports. First, we want to identify documentation defect types contributing to documentation defects, thereby identifying documentation debt. Secondly, we aim to find pragmatic solutions to minimize most common documentation defects to pay off the documentation debt in the long run. We investigated documentation defects related to an industrial software system. First, we looked at different documentation types and associated bug reports. We categorized the defects according to an existing documentation defect taxonomy. Based on a sample of 101 defects, we found that most defects are caused by documentation defects falling into the Information Content (What) category (86). Within this category, the documentation defect types Erroneous code examples (23), Missing documentation (35), and Outdated content (19) contributed to most of the documentation defects. We propose to adapt two solutions to mitigate these types of documentation defects. In practice, documentation debt can easily go undetected since a large share of resources and focus is dedicated to delivering high-quality software. We suggest adapting two main solutions to tackle documentation debt by implementing (i) Dynamic Documentation Generation (DDG) and/or (ii) Automated Documentation Testing (ADT), which are both based on defining a single and robust information source for documentation.

Towards identifying and minimizing customer-facing documentation debt

TL;DR

The paper addresses the problem of customer-facing documentation debt lagging behind software evolution, causing maintenance costs and delays. It analyzes bug reports from System A using the Aghajani taxonomy to identify prevalent documentation defects, finding that information content gaps (especially erroneous code examples, missing documentation, and outdated content) are dominant. Based on these findings, it proposes two automation-driven solutions—Dynamic Documentation Generation (DDG) and Automated Documentation Testing (ADT)—grounded in a single robust information source and implemented conceptually through Darwin Information Typing Architecture (DITA). The study provides empirical insights into defect distribution, costs, and feasibility of automation, and outlines implementation paths for industry to reduce documentation debt and improve co-evolution of docs and software. The work emphasizes practical impact by offering concrete design criteria, potential cost savings (notably addressing 59% of defects that automation could detect), and a clear path to industrial validation. Data supporting the findings are publicly available, enabling replication and extension.

Abstract

Software documentation often struggles to catch up with the pace of software evolution. The lack of correct, complete, and up-to-date documentation results in an increasing number of documentation defects which could introduce delays in integrating software systems. In our previous study on a bug analysis tool called MultiDimEr, we provided evidence that documentation-related defects contribute to many bug reports. First, we want to identify documentation defect types contributing to documentation defects, thereby identifying documentation debt. Secondly, we aim to find pragmatic solutions to minimize most common documentation defects to pay off the documentation debt in the long run. We investigated documentation defects related to an industrial software system. First, we looked at different documentation types and associated bug reports. We categorized the defects according to an existing documentation defect taxonomy. Based on a sample of 101 defects, we found that most defects are caused by documentation defects falling into the Information Content (What) category (86). Within this category, the documentation defect types Erroneous code examples (23), Missing documentation (35), and Outdated content (19) contributed to most of the documentation defects. We propose to adapt two solutions to mitigate these types of documentation defects. In practice, documentation debt can easily go undetected since a large share of resources and focus is dedicated to delivering high-quality software. We suggest adapting two main solutions to tackle documentation debt by implementing (i) Dynamic Documentation Generation (DDG) and/or (ii) Automated Documentation Testing (ADT), which are both based on defining a single and robust information source for documentation.
Paper Structure (17 sections, 3 figures, 3 tables)

This paper contains 17 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Topic-based organization of DITA documents.
  • Figure 2: High level overview of testing phases and frameworks.
  • Figure 3: Extraction of commands by a test case from a DITA topic file.