Beyond Code Contributions: How Network Position, Temporal Bursts, and Code Review Activities Shape Contributor Influence in Large-Scale Open Source Ecosystems
S M Rakib Ul Karim, Wenyi Lu, Sean Goggins
TL;DR
The paper tackles how contributor influence emerges and evolves in large scale OSS ecosystems by leveraging a 25 year longitudinal CNCF dataset. It combines GPU-accelerated graph neural networks for role classification, temporal network analysis with LSTM forecasting, and structural integrity simulations to quantify the impact of different contributor positions. The findings show a persistent power-law-like concentration of influence, with the top 1% controlling around $40\%$ of total influence, and reveal that code review activities are stronger predictors of influence than individual code contributions. The study demonstrates that OSS networks evolve toward modular small-world architectures while core contributors remain critical for cohesion, highlighting sustainability risks and informing governance and retention strategies. The results offer quantitative foundations for designing resilient collaborative infrastructures and recognizing coordination labor as a key driver of community health in software ecosystems.
Abstract
Open source software (OSS) projects rely on complex networks of contributors whose interactions drive innovation and sustainability. This study presents a comprehensive analysis of OSS contributor networks using advanced graph neural networks and temporal network analysis on data spanning 25 years from the Cloud Native Computing Foundation ecosystem, encompassing sandbox, incubating, and graduated projects. Our analysis of thousands of contributors across hundreds of repositories reveals that OSS networks exhibit strong power-law distributions in influence, with the top 1\% of contributors controlling a substantial portion of network influence. Using GPU-accelerated PageRank, betweenness centrality, and custom LSTM models, we identify five distinct contributor roles: Core, Bridge, Connector, Regular, and Peripheral, each with unique network positions and structural importance. Statistical analysis reveals significant correlations between specific action types (commits, pull requests, issues) and contributor influence, with multiple regression models explaining substantial variance in influence metrics. Temporal analysis shows that network density, clustering coefficients, and modularity exhibit statistically significant temporal trends, with distinct regime changes coinciding with major project milestones. Structural integrity simulations show that Bridge contributors, despite representing a small fraction of the network, have a disproportionate impact on network cohesion when removed. Our findings provide empirical evidence for strategic contributor retention policies and offer actionable insights into community health metrics.
