Table of Contents
Fetching ...

Detecting and removing bloated dependencies in CommonJS packages

Yuxin Liu, Deepika Tiwari, Cristian Bogdan, Benoit Baudry

TL;DR

This work investigates code bloat in server-side CommonJS Node.js packages and introduces a trace-based dynamic analysis, DepPrune, to detect and safely remove bloated dependencies. By monitoring OS file-system interactions during runtime, the approach identifies unaccessed dependencies and allows direct or full-scale debloating via updates to package.json and/or package-lock.json, respectively. On a curated dataset of 91 packages with 50,488 runtime dependencies, DepPrune discovers that 50.6% are bloated, with indirect dependencies contributing the majority of bloat; removing direct bloated dependencies cascades to many indirect removals while preserving functionality. The paper benchmarks DepPrune against state-of-the-art static (depcheck) and dynamic (Stubbifier) approaches, showing superior accuracy and fewer misclassifications, and discusses practical implications for development workflows and deployment lifecycles. Overall, the study demonstrates that runtime tracing provides a robust path to leaner dependency trees and reduced maintenance risks in dynamic JavaScript ecosystems.

Abstract

JavaScript packages are notoriously prone to bloat, a factor that significantly impacts the performance and maintainability of web applications. While web bundlers and tree-shaking can mitigate this issue in client-side applications, state-of-the-art techniques have limitations on the detection and removal of bloat in server-side applications. In this paper, we present the first study to investigate bloated dependencies within server-side JavaScript applications, focusing on those built with the widely used and highly dynamic CommonJS module system. We propose a trace-based dynamic analysis that monitors the OS file system to determine which dependencies are not accessed during runtime. To evaluate our approach, we curate an original dataset of 91 CommonJS packages with a total of 50,488 dependencies. Compared to the state-of-the-art dynamic and static approaches, our trace-based analysis demonstrates higher accuracy in detecting bloated dependencies. Our analysis identifies 50.6% of the 50,488 dependencies as bloated: 13.8% of direct dependencies and 51.3% of indirect dependencies. Furthermore, removing only the direct bloated dependencies by cleaning the dependency configuration file can remove a significant share of unnecessary bloated indirect dependencies while preserving functional correctness.

Detecting and removing bloated dependencies in CommonJS packages

TL;DR

This work investigates code bloat in server-side CommonJS Node.js packages and introduces a trace-based dynamic analysis, DepPrune, to detect and safely remove bloated dependencies. By monitoring OS file-system interactions during runtime, the approach identifies unaccessed dependencies and allows direct or full-scale debloating via updates to package.json and/or package-lock.json, respectively. On a curated dataset of 91 packages with 50,488 runtime dependencies, DepPrune discovers that 50.6% are bloated, with indirect dependencies contributing the majority of bloat; removing direct bloated dependencies cascades to many indirect removals while preserving functionality. The paper benchmarks DepPrune against state-of-the-art static (depcheck) and dynamic (Stubbifier) approaches, showing superior accuracy and fewer misclassifications, and discusses practical implications for development workflows and deployment lifecycles. Overall, the study demonstrates that runtime tracing provides a robust path to leaner dependency trees and reduced maintenance risks in dynamic JavaScript ecosystems.

Abstract

JavaScript packages are notoriously prone to bloat, a factor that significantly impacts the performance and maintainability of web applications. While web bundlers and tree-shaking can mitigate this issue in client-side applications, state-of-the-art techniques have limitations on the detection and removal of bloat in server-side applications. In this paper, we present the first study to investigate bloated dependencies within server-side JavaScript applications, focusing on those built with the widely used and highly dynamic CommonJS module system. We propose a trace-based dynamic analysis that monitors the OS file system to determine which dependencies are not accessed during runtime. To evaluate our approach, we curate an original dataset of 91 CommonJS packages with a total of 50,488 dependencies. Compared to the state-of-the-art dynamic and static approaches, our trace-based analysis demonstrates higher accuracy in detecting bloated dependencies. Our analysis identifies 50.6% of the 50,488 dependencies as bloated: 13.8% of direct dependencies and 51.3% of indirect dependencies. Furthermore, removing only the direct bloated dependencies by cleaning the dependency configuration file can remove a significant share of unnecessary bloated indirect dependencies while preserving functional correctness.
Paper Structure (37 sections, 3 figures, 2 tables)

This paper contains 37 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Overview of our approach for automatically debloating runtime dependencies in CommonJS packages. The input consists of the original package, including its source code, the package.json file (listing direct dependencies only), and the workload of the package. The output includes a list of identified bloated dependencies, a debloated package.json file with a reduced set of direct dependencies, and a debloated package-lock.json file documenting all remaining dependencies, both direct and indirect.
  • Figure 2: The number of direct (purple dots) and indirect dependencies (orange dots) in 91 packages, along with lines of code (LoC) in packages (blue dots) and their dependencies (yellow dots), ordered by LoC in package in ascending order. The underlying data for this figure is publicly available: https://zenodo.org/doi/10.5281/zenodo.15090140
  • Figure 3: Within total bloated dependencies identified in 77 packages between trace-based analysis using DepPrune and coverage-based analysis using the adapted Stubbifier, 18,319 dependencies are identified as bloated by both analyses. Trace-based analysis identifies 19,858 (18,319 + 1,539) bloated dependencies, while coverage-based analysis identifies 35,162 (18,319 + 16,843) bloated dependencies. Note that no incorrectly identified bloated dependencies are identified by trace-based analysis, whereas coverage-based analysis misclassifies 15,228 dependencies as bloated and misses 1,539 dependencies that are identified by trace-based analysis. Additionally, 1,615 dependencies are classified as dynamically resolvable. Within direct bloated dependencies identified in 91 packages compared between trace-based analysis using DepPrune and static analysis using depcheck, 84 dependencies are identified as bloated by both analyses. Trace-based analysis identifies 120 (84 + 36) bloated dependencies, while static analysis identifies 162 (84 + 78) bloated dependencies. Note that trace-based analysis potentially misclassifies 20 dependencies as bloated and misses 38 dependencies, while static analysis misclassifies 40 dependencies as bloated dependencies and overlooks 16 dependencies.

Theorems & Definitions (2)

  • Definition 1: Unaccessed Dependency
  • Definition 2: Bloated Dependency