Table of Contents
Fetching ...

Does Documentation Matter? An Empirical Study of Practitioners' Perspective on Open-Source Software Adoption

Aaron Imani, Shiva Radmanesh, Iftekhar Ahmed, Mohammad Moshirpour

TL;DR

This paper investigates how OSS documentation influences practitioners' adoption decisions. It combines interviews and a survey to identify PAP-relevant information and adoption criteria, then builds a large Sphinx/Read the Docs–based OSS documentation corpus to automatically extract PAP topics via BERTopic. It further introduces DocMentor, a TF-IDF plus ChatGPT–driven information-augmentation approach that explains technical terms with examples and references, evaluated by practitioner feedback. The work delivers a first OSS documentation corpus, demonstrates a scalable topic-extraction pipeline, and provides a practical augmentation tool with potential to support decision-making in industry adoption processes.

Abstract

In recent years, open-source software (OSS) has become increasingly prevalent in developing software products. While OSS documentation is the primary source of information provided by the developers' community about a product, its role in the industry's adoption process has yet to be examined. We conducted semi-structured interviews and an online survey to provide insight into this area. Based on interviews and survey insights, we developed a topic model to collect relevant information from OSS documentation automatically. Additionally, according to our survey responses regarding challenges associated with OSS documentation, we propose a novel information augmentation approach, DocMentor, by combining OSS documentation corpus TF-IDF scores and ChatGPT. Through explaining technical terms and providing examples and references, our approach enhances the documentation context and improves practitioners' understanding. Our tool's effectiveness is assessed by surveying practitioners.

Does Documentation Matter? An Empirical Study of Practitioners' Perspective on Open-Source Software Adoption

TL;DR

This paper investigates how OSS documentation influences practitioners' adoption decisions. It combines interviews and a survey to identify PAP-relevant information and adoption criteria, then builds a large Sphinx/Read the Docs–based OSS documentation corpus to automatically extract PAP topics via BERTopic. It further introduces DocMentor, a TF-IDF plus ChatGPT–driven information-augmentation approach that explains technical terms with examples and references, evaluated by practitioner feedback. The work delivers a first OSS documentation corpus, demonstrates a scalable topic-extraction pipeline, and provides a practical augmentation tool with potential to support decision-making in industry adoption processes.

Abstract

In recent years, open-source software (OSS) has become increasingly prevalent in developing software products. While OSS documentation is the primary source of information provided by the developers' community about a product, its role in the industry's adoption process has yet to be examined. We conducted semi-structured interviews and an online survey to provide insight into this area. Based on interviews and survey insights, we developed a topic model to collect relevant information from OSS documentation automatically. Additionally, according to our survey responses regarding challenges associated with OSS documentation, we propose a novel information augmentation approach, DocMentor, by combining OSS documentation corpus TF-IDF scores and ChatGPT. Through explaining technical terms and providing examples and references, our approach enhances the documentation context and improves practitioners' understanding. Our tool's effectiveness is assessed by surveying practitioners.
Paper Structure (27 sections, 7 figures, 3 tables)

This paper contains 27 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Overall methodology overview
  • Figure 2: Topic modeling pipeline
  • Figure 3: Topic prediction pipeline
  • Figure 4: Embedding models prediction performance comparison. The Y values present the median of the embedding model's weighted average F1-score within all hyperparameters.
  • Figure 5: Topic prediction thresholds performance comparison. Note that the Y ranges have been limited to a range that eases observing the changes.
  • ...and 2 more figures