Table of Contents
Fetching ...

How Deep Does Your Dependency Tree Go? An Empirical Study of Dependency Amplification Across 10 Package Ecosystems

Jahidul Arafat

TL;DR

This study systematically compares dependency amplification across 10 major package ecosystems by analyzing 500 projects and measuring transitive versus direct dependencies. Using non-parametric statistics and a robust replication package, it reveals that Maven exhibits substantially higher and more unpredictable amplification than most ecosystems, while CocoaPods, PyPI, Cargo, and Packagist demonstrate controlled amplification. The work links amplification patterns to ecosystem design choices, such as dependency resolution strategies and standard library breadth, and argues for ecosystem-specific security tooling and governance. Overall, the findings challenge the assumption that npm drives the deepest dependency trees and highlight the need for targeted security practices based on each ecosystem's characteristics.

Abstract

Modern software development relies on package ecosystems where a single declared dependency can pull in many additional transitive packages. This dependency amplification, defined as the ratio of transitive to direct dependencies, has major implications for software supply chain security, yet amplification patterns across ecosystems have not been compared at scale. We present an empirical study of 500 projects across ten major ecosystems, including Maven Central for Java, npm Registry for JavaScript, crates io for Rust, PyPI for Python, NuGet Gallery for dot NET, RubyGems for Ruby, Go Modules for Go, Packagist for PHP, CocoaPods for Swift and Objective C, and Pub for Dart. Our analysis shows that Maven exhibits mean amplification of 24.70 times, compared to 4.48 times for Go Modules, 4.32 times for npm, and 0.32 times for CocoaPods. We find significant differences with large effect sizes in 22 of 45 pairwise comparisons, challenging the assumption that npm has the highest amplification due to its many small purpose packages. We observe that 28 percent of Maven projects exceed 10 times amplification, indicating a systematic pattern rather than isolated outliers, compared to 14 percent for RubyGems, 12 percent for npm, and zero percent for Cargo, PyPI, Packagist, CocoaPods, and Pub. We attribute these differences to ecosystem design choices such as dependency resolution behavior, standard library completeness, and platform constraints. Our findings suggest adopting ecosystem specific security strategies, including systematic auditing for Maven environments, targeted outlier detection for npm and RubyGems, and continuation of current practices for ecosystems with controlled amplification. We provide a full replication package with data and analysis scripts.

How Deep Does Your Dependency Tree Go? An Empirical Study of Dependency Amplification Across 10 Package Ecosystems

TL;DR

This study systematically compares dependency amplification across 10 major package ecosystems by analyzing 500 projects and measuring transitive versus direct dependencies. Using non-parametric statistics and a robust replication package, it reveals that Maven exhibits substantially higher and more unpredictable amplification than most ecosystems, while CocoaPods, PyPI, Cargo, and Packagist demonstrate controlled amplification. The work links amplification patterns to ecosystem design choices, such as dependency resolution strategies and standard library breadth, and argues for ecosystem-specific security tooling and governance. Overall, the findings challenge the assumption that npm drives the deepest dependency trees and highlight the need for targeted security practices based on each ecosystem's characteristics.

Abstract

Modern software development relies on package ecosystems where a single declared dependency can pull in many additional transitive packages. This dependency amplification, defined as the ratio of transitive to direct dependencies, has major implications for software supply chain security, yet amplification patterns across ecosystems have not been compared at scale. We present an empirical study of 500 projects across ten major ecosystems, including Maven Central for Java, npm Registry for JavaScript, crates io for Rust, PyPI for Python, NuGet Gallery for dot NET, RubyGems for Ruby, Go Modules for Go, Packagist for PHP, CocoaPods for Swift and Objective C, and Pub for Dart. Our analysis shows that Maven exhibits mean amplification of 24.70 times, compared to 4.48 times for Go Modules, 4.32 times for npm, and 0.32 times for CocoaPods. We find significant differences with large effect sizes in 22 of 45 pairwise comparisons, challenging the assumption that npm has the highest amplification due to its many small purpose packages. We observe that 28 percent of Maven projects exceed 10 times amplification, indicating a systematic pattern rather than isolated outliers, compared to 14 percent for RubyGems, 12 percent for npm, and zero percent for Cargo, PyPI, Packagist, CocoaPods, and Pub. We attribute these differences to ecosystem design choices such as dependency resolution behavior, standard library completeness, and platform constraints. Our findings suggest adopting ecosystem specific security strategies, including systematic auditing for Maven environments, targeted outlier detection for npm and RubyGems, and continuation of current practices for ecosystems with controlled amplification. We provide a full replication package with data and analysis scripts.

Paper Structure

This paper contains 20 sections, 13 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Dependency specification from a typical Maven project. The project directly requires 8 packages, but these bring over 100 additional transitive dependencies. A single Spring Boot dependency transitively depends on dozens of other packages including web servers, JSON processors, and database connectors. Updating any package in this chain can trigger cascading version changes throughout the dependency tree.
  • Figure 2: Dependency amplification distribution across 10 ecosystems. Maven exhibits significantly higher amplification with extreme outliers reaching 198.5 times. Cargo and CocoaPods maintain controlled amplification below 4 times.
  • Figure 3: Direct versus total dependencies across 10 ecosystems. Maven and RubyGems show steep amplification slopes indicating systematic elevated amplification. The npm ecosystem shows higher variance with some projects reaching extreme values while others maintain moderate totals. Cargo, PyPI, and Packagist maintain controlled linear growth across all projects.
  • Figure 4: Cumulative distribution functions of amplification factor across 10 ecosystems. Cargo and CocoaPods show steep rise indicating consistent low amplification. Maven and RubyGems show gradual rise with long tail. The npm ecosystem shows bimodal pattern with concentration at low amplification but extreme outliers.
  • Figure 5: Heatmap of normalized ecosystem characteristics across five metrics: mean direct dependencies, mean transitive dependencies, mean amplification, Gini coefficient, and zero-dependency percentage. Darker colors indicate higher normalized values. Maven shows extreme amplification and transitive dependency characteristics. CocoaPods demonstrates minimal dependency footprint. Color intensity reveals natural ecosystem clustering with Maven isolated and most ecosystems forming a controlled-amplification cluster.
  • ...and 1 more figures