Table of Contents
Fetching ...

Insecure Ingredients? Exploring Dependency Update Patterns of Bundled JavaScript Packages on the Web

Ben Swierzy, Marc Ohm, Michael Meier

TL;DR

This paper tackles the challenge of identifying JavaScript package versions used in production web bundles, addressing gaps left by CDN and single-file analyses. It introduces Aletheia, a package-agnostic pipeline that combines File Selection, Transpilation, and Bundling with Dolos-inspired rolling AST fingerprints to detect versions against a comprehensive npm artifact index. Empirical results show strong patch-level detection in lab (≈87%) and real-world bundles (≈82%), outperforming prior work, and reveal that bundles update more quickly than CDN resources, though vulnerabilities remain more common in CDN-delivered packages. The study also uncovers that large vendors significantly influence update dynamics, suggesting nuanced strategies are needed to improve web software supply chain security. Overall, Aletheia enables scalable, reproducible measurement of dependency updates and vulnerabilities across the Web, with implications for monitoring and securing JavaScript ecosystems.

Abstract

Reusable software components, typically distributed as packages, are a central paradigm of modern software development. The JavaScript ecosystem serves as a prime example, offering millions of packages with their use being promoted as idiomatic. However, download statistics on npm raise security concerns as they indicate a high popularity of vulnerable package versions while their real prevalence on production websites remains unknown. Package version detection mechanisms fill this gap by extracting utilized packages and versions from observed artifacts on the web. Prior research focuses on mechanisms for either hand-selected popular packages in bundles or for single-file resources utilizing the global namespace. This does not allow for a thorough analysis of modern web applications' dependency update behavior at scale. In this work, we improve upon this by presenting Aletheia, a package-agnostic method which dissects JavaScript bundles to identify package versions through algorithms originating from the field of plagiarism detection. We show that this method clearly outperforms the existing approaches in practical settings. Furthermore, we crawl the Tranco top 100,000 domains to reveal that 5% - 20% of domains update their dependencies within 16 weeks. Surprisingly, from a longitudinal perspective, bundled packages are updated significantly faster than their CDN-included counterparts, with consequently up to 10 times fewer known vulnerable package versions included. Still, we observe indicators that few widespread vendors seem to be a major driving force behind timely updates, implying that quantitative measures are not painting a complete picture.

Insecure Ingredients? Exploring Dependency Update Patterns of Bundled JavaScript Packages on the Web

TL;DR

This paper tackles the challenge of identifying JavaScript package versions used in production web bundles, addressing gaps left by CDN and single-file analyses. It introduces Aletheia, a package-agnostic pipeline that combines File Selection, Transpilation, and Bundling with Dolos-inspired rolling AST fingerprints to detect versions against a comprehensive npm artifact index. Empirical results show strong patch-level detection in lab (≈87%) and real-world bundles (≈82%), outperforming prior work, and reveal that bundles update more quickly than CDN resources, though vulnerabilities remain more common in CDN-delivered packages. The study also uncovers that large vendors significantly influence update dynamics, suggesting nuanced strategies are needed to improve web software supply chain security. Overall, Aletheia enables scalable, reproducible measurement of dependency updates and vulnerabilities across the Web, with implications for monitoring and securing JavaScript ecosystems.

Abstract

Reusable software components, typically distributed as packages, are a central paradigm of modern software development. The JavaScript ecosystem serves as a prime example, offering millions of packages with their use being promoted as idiomatic. However, download statistics on npm raise security concerns as they indicate a high popularity of vulnerable package versions while their real prevalence on production websites remains unknown. Package version detection mechanisms fill this gap by extracting utilized packages and versions from observed artifacts on the web. Prior research focuses on mechanisms for either hand-selected popular packages in bundles or for single-file resources utilizing the global namespace. This does not allow for a thorough analysis of modern web applications' dependency update behavior at scale. In this work, we improve upon this by presenting Aletheia, a package-agnostic method which dissects JavaScript bundles to identify package versions through algorithms originating from the field of plagiarism detection. We show that this method clearly outperforms the existing approaches in practical settings. Furthermore, we crawl the Tranco top 100,000 domains to reveal that 5% - 20% of domains update their dependencies within 16 weeks. Surprisingly, from a longitudinal perspective, bundled packages are updated significantly faster than their CDN-included counterparts, with consequently up to 10 times fewer known vulnerable package versions included. Still, we observe indicators that few widespread vendors seem to be a major driving force behind timely updates, implying that quantitative measures are not painting a complete picture.

Paper Structure

This paper contains 32 sections, 1 equation, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Major phases of the bundling process: (a) module resolution, (b) tree shaking, (c) code splitting, and (d) bundling and minification
  • Figure 2: Processing pipelines from the files collected through the Internet scans to the evaluations. The package index is used as the reference database for Aletheia in all evaluations with a dashed border.
  • Figure 3: Average amounts of bundles per domain and bundler. Webpack chunks (yellow) are responsible for half of its detections.
  • Figure 4: Average absolute amounts of included CDN per responsive domain and CDN. At least one denotes the percentage of domains with a CDN resource.
  • Figure 5: Percentage of packages, package instances and domains where updates onto a version published in the given interval have been observed. Stacked bars are cumulative.