Table of Contents
Fetching ...

Towards Robust Detection of Open Source Software Supply Chain Poisoning Attacks in Industry Environments

Xinyi Zheng, Chen Wei, Shenao Wang, Yanjie Zhao, Peiming Gao, Yuanchao Zhang, Kailong Wang, Haoyu Wang

TL;DR

OSCAR introduces a robust dynamic analysis pipeline to detect open-source supply-chain poisoning in NPM and PyPI, addressing static and limited dynamic detection shortcomings by fully executing packages in sandboxed environments, fuzz-testing exported APIs, and applying aspect-based behavior monitoring. It demonstrates strong performance, achieving F1 scores of 0.95 on NPM and 0.91 on PyPI, while substantially reducing false positives on risk-like benign packages. Industrial deployment at Ant Group over 18 months identified thousands of malicious packages, underscoring practical effectiveness and real-world impact. The work highlights the value of runtime behavior analysis in reducing manual review workload and advancing secure supply-chain practices in large-scale OSS ecosystems.

Abstract

The exponential growth of open-source package ecosystems, particularly NPM and PyPI, has led to an alarming increase in software supply chain poisoning attacks. Existing static analysis methods struggle with high false positive rates and are easily thwarted by obfuscation and dynamic code execution techniques. While dynamic analysis approaches offer improvements, they often suffer from capturing non-package behaviors and employing simplistic testing strategies that fail to trigger sophisticated malicious behaviors. To address these challenges, we present OSCAR, a robust dynamic code poisoning detection pipeline for NPM and PyPI ecosystems. OSCAR fully executes packages in a sandbox environment, employs fuzz testing on exported functions and classes, and implements aspect-based behavior monitoring with tailored API hook points. We evaluate OSCAR against six existing tools using a comprehensive benchmark dataset of real-world malicious and benign packages. OSCAR achieves an F1 score of 0.95 in NPM and 0.91 in PyPI, confirming that OSCAR is as effective as the current state-of-the-art technologies. Furthermore, for benign packages exhibiting characteristics typical of malicious packages, OSCAR reduces the false positive rate by an average of 32.06% in NPM (from 34.63% to 2.57%) and 39.87% in PyPI (from 41.10% to 1.23%), compared to other tools, significantly reducing the workload of manual reviews in real-world deployments. In cooperation with Ant Group, a leading financial technology company, we have deployed OSCAR on its NPM and PyPI mirrors since January 2023, identifying 10,404 malicious NPM packages and 1,235 malicious PyPI packages over 18 months. This work not only bridges the gap between academic research and industrial application in code poisoning detection but also provides a robust and practical solution that has been thoroughly tested in a real-world industrial setting.

Towards Robust Detection of Open Source Software Supply Chain Poisoning Attacks in Industry Environments

TL;DR

OSCAR introduces a robust dynamic analysis pipeline to detect open-source supply-chain poisoning in NPM and PyPI, addressing static and limited dynamic detection shortcomings by fully executing packages in sandboxed environments, fuzz-testing exported APIs, and applying aspect-based behavior monitoring. It demonstrates strong performance, achieving F1 scores of 0.95 on NPM and 0.91 on PyPI, while substantially reducing false positives on risk-like benign packages. Industrial deployment at Ant Group over 18 months identified thousands of malicious packages, underscoring practical effectiveness and real-world impact. The work highlights the value of runtime behavior analysis in reducing manual review workload and advancing secure supply-chain practices in large-scale OSS ecosystems.

Abstract

The exponential growth of open-source package ecosystems, particularly NPM and PyPI, has led to an alarming increase in software supply chain poisoning attacks. Existing static analysis methods struggle with high false positive rates and are easily thwarted by obfuscation and dynamic code execution techniques. While dynamic analysis approaches offer improvements, they often suffer from capturing non-package behaviors and employing simplistic testing strategies that fail to trigger sophisticated malicious behaviors. To address these challenges, we present OSCAR, a robust dynamic code poisoning detection pipeline for NPM and PyPI ecosystems. OSCAR fully executes packages in a sandbox environment, employs fuzz testing on exported functions and classes, and implements aspect-based behavior monitoring with tailored API hook points. We evaluate OSCAR against six existing tools using a comprehensive benchmark dataset of real-world malicious and benign packages. OSCAR achieves an F1 score of 0.95 in NPM and 0.91 in PyPI, confirming that OSCAR is as effective as the current state-of-the-art technologies. Furthermore, for benign packages exhibiting characteristics typical of malicious packages, OSCAR reduces the false positive rate by an average of 32.06% in NPM (from 34.63% to 2.57%) and 39.87% in PyPI (from 41.10% to 1.23%), compared to other tools, significantly reducing the workload of manual reviews in real-world deployments. In cooperation with Ant Group, a leading financial technology company, we have deployed OSCAR on its NPM and PyPI mirrors since January 2023, identifying 10,404 malicious NPM packages and 1,235 malicious PyPI packages over 18 months. This work not only bridges the gap between academic research and industrial application in code poisoning detection but also provides a robust and practical solution that has been thoroughly tested in a real-world industrial setting.
Paper Structure (17 sections, 6 figures, 9 tables, 1 algorithm)

This paper contains 17 sections, 6 figures, 9 tables, 1 algorithm.

Figures (6)

  • Figure 1: Obfuscated JavaScript malicious package example.
  • Figure 2: The workflow of OSCAR, which executes packages in docker (\ref{['sec:execution']}), encompassing package installation, import, and function/class invocation, followed by suspicious behavior capture using AOP and Falco (\ref{['sec:capture']}).
  • Figure 3: Compare results on the benchmark dataset.
  • Figure 4: JavaScript obfuscation sample.
  • Figure 5: Python remote download and execution sample.
  • ...and 1 more figures