Table of Contents
Fetching ...

The Introduction of README and CONTRIBUTING Files in Open Source Software Development

Matthew Gaughan, Kaylea Champion, Sohyeon Hwang, Aaron Shaw

TL;DR

The study investigates the introduction of README and CONTRIBUTING files in FLOSS projects using a Debian-packaged dataset (n=4226 READMEs, n=714 CONTRIBUTING across 4247 projects) and a multilevel longitudinal design with a regression-discontinuity framework around the publication event. It finds READMEs are typically introduced early and tend to be brief, whereas CONTRIBUTING files appear later after a surge in activity, with initial content focused on usage or contribution procedures rather than community-building; there is little evidence for a causal rise in activity from these documents. The analysis employs a negative-binomial model with a bandwidth of 10 weeks, along with LDA topic modeling (9 README topics, 5 CONTRIBUTING topics) and readability metrics, to relate document characteristics to subsequent activity. Overall, the results suggest that early governance documentation often serves hygiene rather than catalyst roles, though certain topics in these files show associations with later contributions, indicating nuanced, context-dependent effects and highlighting the need for tooling and guidance that better align documentation with early-stage project needs.

Abstract

README and CONTRIBUTING files can serve as the first point of contact for potential contributors to free/libre and open source software (FLOSS) projects. Prominent open source software organizations such as Mozilla, GitHub, and the Linux Foundation advocate that projects provide community-focused and process-oriented documentation early to foster recruitment and activity. In this paper we investigate the introduction of these documents in FLOSS projects, including whether early documentation conforms to these recommendations or explains subsequent activity. We use a novel dataset of FLOSS projects packaged by the Debian GNU/Linux distribution and conduct a quantitative analysis to examine README (n=4226) and CONTRIBUTING (n=714) files when they are first published into projects' repositories. We find that projects create minimal READMEs proactively, but often publish CONTRIBUTING files following an influx of contributions. The initial versions of these files rarely focus on community development, instead containing descriptions of project procedure for library usage or code contribution. The findings suggest that FLOSS projects do not create documentation with community-building in mind, but rather favor brevity and standardized instructions.

The Introduction of README and CONTRIBUTING Files in Open Source Software Development

TL;DR

The study investigates the introduction of README and CONTRIBUTING files in FLOSS projects using a Debian-packaged dataset (n=4226 READMEs, n=714 CONTRIBUTING across 4247 projects) and a multilevel longitudinal design with a regression-discontinuity framework around the publication event. It finds READMEs are typically introduced early and tend to be brief, whereas CONTRIBUTING files appear later after a surge in activity, with initial content focused on usage or contribution procedures rather than community-building; there is little evidence for a causal rise in activity from these documents. The analysis employs a negative-binomial model with a bandwidth of 10 weeks, along with LDA topic modeling (9 README topics, 5 CONTRIBUTING topics) and readability metrics, to relate document characteristics to subsequent activity. Overall, the results suggest that early governance documentation often serves hygiene rather than catalyst roles, though certain topics in these files show associations with later contributions, indicating nuanced, context-dependent effects and highlighting the need for tooling and guidance that better align documentation with early-stage project needs.

Abstract

README and CONTRIBUTING files can serve as the first point of contact for potential contributors to free/libre and open source software (FLOSS) projects. Prominent open source software organizations such as Mozilla, GitHub, and the Linux Foundation advocate that projects provide community-focused and process-oriented documentation early to foster recruitment and activity. In this paper we investigate the introduction of these documents in FLOSS projects, including whether early documentation conforms to these recommendations or explains subsequent activity. We use a novel dataset of FLOSS projects packaged by the Debian GNU/Linux distribution and conduct a quantitative analysis to examine README (n=4226) and CONTRIBUTING (n=714) files when they are first published into projects' repositories. We find that projects create minimal READMEs proactively, but often publish CONTRIBUTING files following an influx of contributions. The initial versions of these files rarely focus on community development, instead containing descriptions of project procedure for library usage or code contribution. The findings suggest that FLOSS projects do not create documentation with community-building in mind, but rather favor brevity and standardized instructions.

Paper Structure

This paper contains 27 sections, 3 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Plot of average (log-transformed) contribution counts over time around the point of document introduction (weeks offset from document publication date) for README (red) and CONTRIBUTING (blue) files.The Y-axis has been scaled to real count values.
  • Figure 2: Plot of project-level fitted random effects coefficients and estimated 95% confidence intervals (Y-axis) sorted by coefficient rank (X-axis) from the model of project activity as a function of CONTRIBUTING file introduction. Colors correspond to groupings of projects based on whether the 95% CI for the random effects coefficient estimate is less than (dark blue), overlapping with (teal), or greater than (lime) zero.
  • Figure 3: Plot of the kernel density of document word counts for first-version README (red) and CONTRIBUTING (blue) files.
  • Figure 4: Plot of the kernel densities for Flesch Reading Ease (left-column) and reading time (right-column) metrics for README (bottom row) and CONTRIBUTING (top row) documents. Colors correspond to groupings of projects based on whether the 95% CI for the random effects coefficient estimate is less than (dark blue), overlapping with (teal), or greater than (lime) zero.