Table of Contents
Fetching ...

Revisiting Aristotle vs. Ringelmann: The influence of biases on measuring productivity in Open Source software development

Christian Gut, Alfredo Goldman

TL;DR

The paper tackles whether Open Source project productivity scales sublinearly or superlinearly with team size by replicating Sornette's and Scholtes' regression-based analyses on GitHub data. It systematically examines project selection, time-window definitions, output metrics, and regression specifications to identify biases, revealing that selection bias partially explains the discrepancies while instrumentation biases—notably a $p$-value filter and Front Load Days—play a substantial role. By reapplying both methods across all projects and applying bias-corrective analyses, the study shows that accounting for these biases alters the balance of sublinear vs. superlinear findings and highlights the critical importance of replication, data provenance, and transparent statistical practices in empirical software engineering. The work underscores practical implications for measuring OSS productivity and provides concrete guidance for more robust, bias-aware analyses, including the potential adoption of multidimensional metrics like SPACE for future studies.

Abstract

Aristotle vs. Ringelmann was a discussion between two distinct research teams from the ETH Zürich who argued whether the productivity of Open Source software projects scales sublinear or superlinear with regard to its team size. This discussion evolved around two publications, which apparently used similar techniques by sampling projects on GitHub and running regression analyses to answer the question about superlinearity. Despite the similarity in their research methods, one team around Ingo Scholtes reached the conclusion that projects scale sublinear, while the other team around Didier Sornette ascertained a superlinear relationship between team size and productivity. In subsequent publications, the two authors argue that the opposite conclusions may be attributed to differences in project populations, since 81.7% of Sornette's projects have less than 50 contributors. Scholtes, on the other hand, sampled specifically projects with more than 50 contributors. This publication compares the research from both authors by replicating their findings, thus allowing for an evaluation of how much project sampling actually accounted for the differences between Scholtes' and Sornette's results. Thereby, the discovery was made that sampling bias only partially explains the discrepancies between the two authors. Further analysis led to the detection of instrumentation biases that drove the regression coefficients in opposite directions. These findings were then consolidated into a quantitative analysis, indicating that instrumentation biases contributed more to the differences between Scholtes' and Sornette's work than the selection bias suggested by both authors.

Revisiting Aristotle vs. Ringelmann: The influence of biases on measuring productivity in Open Source software development

TL;DR

The paper tackles whether Open Source project productivity scales sublinearly or superlinearly with team size by replicating Sornette's and Scholtes' regression-based analyses on GitHub data. It systematically examines project selection, time-window definitions, output metrics, and regression specifications to identify biases, revealing that selection bias partially explains the discrepancies while instrumentation biases—notably a -value filter and Front Load Days—play a substantial role. By reapplying both methods across all projects and applying bias-corrective analyses, the study shows that accounting for these biases alters the balance of sublinear vs. superlinear findings and highlights the critical importance of replication, data provenance, and transparent statistical practices in empirical software engineering. The work underscores practical implications for measuring OSS productivity and provides concrete guidance for more robust, bias-aware analyses, including the potential adoption of multidimensional metrics like SPACE for future studies.

Abstract

Aristotle vs. Ringelmann was a discussion between two distinct research teams from the ETH Zürich who argued whether the productivity of Open Source software projects scales sublinear or superlinear with regard to its team size. This discussion evolved around two publications, which apparently used similar techniques by sampling projects on GitHub and running regression analyses to answer the question about superlinearity. Despite the similarity in their research methods, one team around Ingo Scholtes reached the conclusion that projects scale sublinear, while the other team around Didier Sornette ascertained a superlinear relationship between team size and productivity. In subsequent publications, the two authors argue that the opposite conclusions may be attributed to differences in project populations, since 81.7% of Sornette's projects have less than 50 contributors. Scholtes, on the other hand, sampled specifically projects with more than 50 contributors. This publication compares the research from both authors by replicating their findings, thus allowing for an evaluation of how much project sampling actually accounted for the differences between Scholtes' and Sornette's results. Thereby, the discovery was made that sampling bias only partially explains the discrepancies between the two authors. Further analysis led to the detection of instrumentation biases that drove the regression coefficients in opposite directions. These findings were then consolidated into a quantitative analysis, indicating that instrumentation biases contributed more to the differences between Scholtes' and Sornette's work than the selection bias suggested by both authors.
Paper Structure (22 sections, 12 figures, 4 tables)

This paper contains 22 sections, 12 figures, 4 tables.

Figures (12)

  • Figure 1: General regression approach used by both authors
  • Figure 2: Illustration regression method details
  • Figure 3: Potential exclusion of relevant commits if time window end date and mining date lie too close together
  • Figure 4: Comparison between regression coefficients as reported in the article and the measured values using data from GitHub
  • Figure 5: Results of the Wilcoxon tests
  • ...and 7 more figures