Table of Contents
Fetching ...

Ecosystem-wide influences on pull request decisions: insights from NPM

Willem Meijer, Mirela Riveni, Ayushi Rastogi

TL;DR

The paper investigates ecosystem-wide influences on pull request decisions within the NPM ecosystem, arguing that cross-project contributions and collaborations shape PR outcomes beyond intra-project factors. It combines a 1.8 million PRs, 2.1 million issues, and dependency data across over 20,000 projects, applying social-network analysis and mixed-effects logistic regression alongside random forests to quantify and predict PR acceptance. Key contributions include a fine-grained taxonomy of ecosystem contributions (upstream, downstream, non-dependency), a novel ecosystem-wide collaboration metric (second-order degree centrality) and direct collaboration (link strength), plus qualitative insights from 538 PRs revealing 3 overarching and 10 specific ecosystem-reference motivations. The findings show ecosystem participation generally improves PR acceptance, especially for newcomers, and that ecosystem-wide signals complement intra-project cues, with practical implications for onboarding, cross-project collaboration strategies, and ecosystem governance. The work advances understanding of software supply chains by demonstrating how ecosystem-level socio-technical factors translate into pull request outcomes across dependent projects.

Abstract

The pull-based development model facilitates global collaboration within open-source software projects. However, whereas it is increasingly common for software to depend on other projects in their ecosystem, most research on the pull request decision-making process explored factors within projects, not the broader software ecosystem they comprise. We uncover ecosystem-wide factors that influence pull request acceptance decisions. We collected a dataset of approximately 1.8 million pull requests and 2.1 million issues from 20,052 GitHub projects within the NPM ecosystem. Of these, 98% depend on another project in the dataset, enabling studying collaboration across dependent projects. We employed social network analysis to create a collaboration network in the ecosystem, and mixed effects logistic regression and random forest techniques to measure the impact and predictive strength of the tested features. We find that gaining experience within the software ecosystem through active participation in issue-tracking systems, submitting pull requests, and collaborating with pull request integrators and experienced developers benefits all open-source contributors, especially project newcomers. These results are complemented with an exploratory qualitative analysis of 538 pull requests. We find that developers with ecosystem experience make different contributions than users without. Zooming in on a subset of 111 pull requests with clear ecosystem involvement, we find 3 overarching and 10 specific reasons why developers involve ecosystem projects in their pull requests. The results show that combining ecosystem-wide factors with features studied in previous work to predict the outcome of pull requests reached an overall F1 score of 0.92. However, the outcomes of pull requests submitted by newcomers are harder to predict.

Ecosystem-wide influences on pull request decisions: insights from NPM

TL;DR

The paper investigates ecosystem-wide influences on pull request decisions within the NPM ecosystem, arguing that cross-project contributions and collaborations shape PR outcomes beyond intra-project factors. It combines a 1.8 million PRs, 2.1 million issues, and dependency data across over 20,000 projects, applying social-network analysis and mixed-effects logistic regression alongside random forests to quantify and predict PR acceptance. Key contributions include a fine-grained taxonomy of ecosystem contributions (upstream, downstream, non-dependency), a novel ecosystem-wide collaboration metric (second-order degree centrality) and direct collaboration (link strength), plus qualitative insights from 538 PRs revealing 3 overarching and 10 specific ecosystem-reference motivations. The findings show ecosystem participation generally improves PR acceptance, especially for newcomers, and that ecosystem-wide signals complement intra-project cues, with practical implications for onboarding, cross-project collaboration strategies, and ecosystem governance. The work advances understanding of software supply chains by demonstrating how ecosystem-level socio-technical factors translate into pull request outcomes across dependent projects.

Abstract

The pull-based development model facilitates global collaboration within open-source software projects. However, whereas it is increasingly common for software to depend on other projects in their ecosystem, most research on the pull request decision-making process explored factors within projects, not the broader software ecosystem they comprise. We uncover ecosystem-wide factors that influence pull request acceptance decisions. We collected a dataset of approximately 1.8 million pull requests and 2.1 million issues from 20,052 GitHub projects within the NPM ecosystem. Of these, 98% depend on another project in the dataset, enabling studying collaboration across dependent projects. We employed social network analysis to create a collaboration network in the ecosystem, and mixed effects logistic regression and random forest techniques to measure the impact and predictive strength of the tested features. We find that gaining experience within the software ecosystem through active participation in issue-tracking systems, submitting pull requests, and collaborating with pull request integrators and experienced developers benefits all open-source contributors, especially project newcomers. These results are complemented with an exploratory qualitative analysis of 538 pull requests. We find that developers with ecosystem experience make different contributions than users without. Zooming in on a subset of 111 pull requests with clear ecosystem involvement, we find 3 overarching and 10 specific reasons why developers involve ecosystem projects in their pull requests. The results show that combining ecosystem-wide factors with features studied in previous work to predict the outcome of pull requests reached an overall F1 score of 0.92. However, the outcomes of pull requests submitted by newcomers are harder to predict.

Paper Structure

This paper contains 31 sections, 7 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Visualization of the data collection process. For more details, refer to Figure 1 in the original master's thesis meijer_influence_2023
  • Figure 2: Feature importance plot showing the mean decrease in Gini calculated using random forest trained using the full dataset and the (non-)newcomer subsets
  • Figure 3: Overview of the reasons to reference another project