The Introduction of README and CONTRIBUTING Files in Open Source Software Development
Matthew Gaughan, Kaylea Champion, Sohyeon Hwang, Aaron Shaw
TL;DR
The study investigates the introduction of README and CONTRIBUTING files in FLOSS projects using a Debian-packaged dataset (n=4226 READMEs, n=714 CONTRIBUTING across 4247 projects) and a multilevel longitudinal design with a regression-discontinuity framework around the publication event. It finds READMEs are typically introduced early and tend to be brief, whereas CONTRIBUTING files appear later after a surge in activity, with initial content focused on usage or contribution procedures rather than community-building; there is little evidence for a causal rise in activity from these documents. The analysis employs a negative-binomial model with a bandwidth of 10 weeks, along with LDA topic modeling (9 README topics, 5 CONTRIBUTING topics) and readability metrics, to relate document characteristics to subsequent activity. Overall, the results suggest that early governance documentation often serves hygiene rather than catalyst roles, though certain topics in these files show associations with later contributions, indicating nuanced, context-dependent effects and highlighting the need for tooling and guidance that better align documentation with early-stage project needs.
Abstract
README and CONTRIBUTING files can serve as the first point of contact for potential contributors to free/libre and open source software (FLOSS) projects. Prominent open source software organizations such as Mozilla, GitHub, and the Linux Foundation advocate that projects provide community-focused and process-oriented documentation early to foster recruitment and activity. In this paper we investigate the introduction of these documents in FLOSS projects, including whether early documentation conforms to these recommendations or explains subsequent activity. We use a novel dataset of FLOSS projects packaged by the Debian GNU/Linux distribution and conduct a quantitative analysis to examine README (n=4226) and CONTRIBUTING (n=714) files when they are first published into projects' repositories. We find that projects create minimal READMEs proactively, but often publish CONTRIBUTING files following an influx of contributions. The initial versions of these files rarely focus on community development, instead containing descriptions of project procedure for library usage or code contribution. The findings suggest that FLOSS projects do not create documentation with community-building in mind, but rather favor brevity and standardized instructions.
