Sources of Underproduction in Open Source Software
Kaylea Champion, Benjamin Mako Hill
TL;DR
The paper addresses underproduction in open source software by testing social and technical correlates within the Debian packaging ecosystem using Champion & Hill's underproduction measure and a suite of predictors. It employs four logistic regression models to relate package age, language age, contributor activity, maintainer dynamics, team organization, and collaboration-network metrics to underproduction, revealing that older software and older languages increase risk, while simply increasing contributors does not reduce risk. Notably, maintenance teams offer little protection in the full model, while central collaborators in bug networks are more involved with underproduced packages, and betweenness centrality shows no clear effect. The findings underscore the complexity of aligning supply with user demand in FLOSS and suggest practitioners should focus on stable, dedicated maintenance and cross-package visibility rather than purely expanding contributor counts or relying on team-based approaches.
Abstract
Because open source software relies on individuals who select their own tasks, it is often underproduced -- a term used by software engineering researchers to describe when a piece of software's relative quality is lower than its relative importance. We examine the social and technical factors associated with underproduction through a comparison of software packaged by the Debian GNU/Linux community. We test a series of hypotheses developed from a reading of prior research in software engineering. Although we find that software age and programming language age offer a partial explanation for variation in underproduction, we were surprised to find that the association between underproduction and package age is weaker at high levels of programming language age. With respect to maintenance efforts, we find that additional resources are not always tied to better outcomes. In particular, having higher numbers of contributors is associated with higher underproduction risk. Also, contrary to our expectations, maintainer turnover and maintenance by a declared team are not associated with lower rates of underproduction. Finally, we find that the people working on bugs in underproduced packages tend to be those who are more central to the community's collaboration network structure, although contributors' betweenness centrality (often associated with brokerage in social networks) is not associated with underproduction.
