How Deep Does Your Dependency Tree Go? An Empirical Study of Dependency Amplification Across 10 Package Ecosystems
Jahidul Arafat
TL;DR
This study systematically compares dependency amplification across 10 major package ecosystems by analyzing 500 projects and measuring transitive versus direct dependencies. Using non-parametric statistics and a robust replication package, it reveals that Maven exhibits substantially higher and more unpredictable amplification than most ecosystems, while CocoaPods, PyPI, Cargo, and Packagist demonstrate controlled amplification. The work links amplification patterns to ecosystem design choices, such as dependency resolution strategies and standard library breadth, and argues for ecosystem-specific security tooling and governance. Overall, the findings challenge the assumption that npm drives the deepest dependency trees and highlight the need for targeted security practices based on each ecosystem's characteristics.
Abstract
Modern software development relies on package ecosystems where a single declared dependency can pull in many additional transitive packages. This dependency amplification, defined as the ratio of transitive to direct dependencies, has major implications for software supply chain security, yet amplification patterns across ecosystems have not been compared at scale. We present an empirical study of 500 projects across ten major ecosystems, including Maven Central for Java, npm Registry for JavaScript, crates io for Rust, PyPI for Python, NuGet Gallery for dot NET, RubyGems for Ruby, Go Modules for Go, Packagist for PHP, CocoaPods for Swift and Objective C, and Pub for Dart. Our analysis shows that Maven exhibits mean amplification of 24.70 times, compared to 4.48 times for Go Modules, 4.32 times for npm, and 0.32 times for CocoaPods. We find significant differences with large effect sizes in 22 of 45 pairwise comparisons, challenging the assumption that npm has the highest amplification due to its many small purpose packages. We observe that 28 percent of Maven projects exceed 10 times amplification, indicating a systematic pattern rather than isolated outliers, compared to 14 percent for RubyGems, 12 percent for npm, and zero percent for Cargo, PyPI, Packagist, CocoaPods, and Pub. We attribute these differences to ecosystem design choices such as dependency resolution behavior, standard library completeness, and platform constraints. Our findings suggest adopting ecosystem specific security strategies, including systematic auditing for Maven environments, targeted outlier detection for npm and RubyGems, and continuation of current practices for ecosystems with controlled amplification. We provide a full replication package with data and analysis scripts.
