Chain-of-Experts (CoE): Reverse Engineering Software Bills of Materials for JavaScript Application Bundles through Code Clone Search
Leo Song, Steven H. H. Ding, Yuan Tian, Li Tao Li, Philippe Charland, Andrew Walenstein
TL;DR
This work addresses the challenge of generating Software Bill of Materials (SBoM) for JavaScript application bundles, where nested scopes, extremely long code sequences, and a vast retrieval space hinder traditional approaches. It introduces Chain-of-Experts (CoE), a multi-task architecture that unifies code segmentation, code classification, and code clone retrieval under a single end-to-end model, leveraging sliding windows, segmentation masking, and Byte-Pair Encoding. The method demonstrates competitive or superior performance across the three tasks on real-world NPM bundles, achieving high segmentation accuracy and robust clone retrieval efficiency via embedding-based search. This end-to-end framework enables scalable, provenance-aware SBoM generation for real-world JavaScript releases, improving security and compliance in software supply chains.
Abstract
A Software Bill of Materials (SBoM) is a detailed inventory of all components, libraries, and modules in a software artifact, providing traceability throughout the software supply chain. With the increasing popularity of JavaScript in software engineering due to its dynamic syntax and seamless supply chain integration, the exposure to vulnerabilities and attacks has risen significantly. A JavaScript application bundle, which is a consolidated, symbol-stripped, and optimized assembly of code for deployment purpose. Generating a SBoM from a JavaScript application bundle through a reverse-engineering process ensures the integrity, security, and compliance of the supplier's software release, even without access to the original dependency graphs. This paper presents the first study on SBoM generation for JavaScript application bundles. We identify three key challenges for this task, i.e., nested code scopes, extremely long sequences, and large retrieval spaces. To address these challenges, we introduce Chain-of-Experts (CoE), a multi-task deep learning model designed to generate SBoMs through three tasks: code segmentation, code classification, and code clone retrieval. We evaluate CoE against individual task-specific solutions on 500 web application bundles with over 66,000 dependencies. Our experimental results demonstrate that CoE offers competitive outcomes with less training and inference time when compared with combined individual task-specific solutions. Consequently, CoE provides the first scalable, efficient, and end-to-end solution for the SBoM generation of real-world JavaScript application bundles.
