SoK: Understanding Vulnerabilities in the Large Language Model Supply Chain

Shenao Wang; Yanjie Zhao; Zhao Liu; Quanchen Zou; Haoyu Wang

SoK: Understanding Vulnerabilities in the Large Language Model Supply Chain

Shenao Wang, Yanjie Zhao, Zhao Liu, Quanchen Zou, Haoyu Wang

TL;DR

This work provides the first systematic, cross-stack analysis of vulnerabilities in the LLM supply chain, collecting 529 CVEs across 75 prominent projects and 13 lifecycle stages. It introduces a four-way root-cause taxonomy, analyzes patch effectiveness with findings that over half of vulnerabilities have fixes yet 8% are ineffective, leading to recurrences. The study reveals vulnerabilities cluster in the application and model layers, with improper resource control and improper neutralization as leading causes, and highlights language ecosystems (Python and JavaScript) as hotspots. By offering open data and actionable insights, it aims to guide secure LLM ecosystem design, mitigation strategies, and future research on vulnerability detection and patching in complex AI software stacks.

Abstract

Large Language Models (LLMs) transform artificial intelligence, driving advancements in natural language understanding, text generation, and autonomous systems. The increasing complexity of their development and deployment introduces significant security challenges, particularly within the LLM supply chain. However, existing research primarily focuses on content safety, such as adversarial attacks, jailbreaking, and backdoor attacks, while overlooking security vulnerabilities in the underlying software systems. To address this gap, this study systematically analyzes 529 vulnerabilities reported across 75 prominent projects spanning 13 lifecycle stages. The findings show that vulnerabilities are concentrated in the application (50.3%) and model (42.7%) layers, with improper resource control (45.7%) and improper neutralization (25.1%) identified as the leading root causes. Additionally, while 56.7% of the vulnerabilities have available fixes, 8% of these patches are ineffective, resulting in recurring vulnerabilities. This study underscores the challenges of securing the LLM ecosystem and provides actionable insights to guide future research and mitigation strategies.

SoK: Understanding Vulnerabilities in the Large Language Model Supply Chain

TL;DR

Abstract

SoK: Understanding Vulnerabilities in the Large Language Model Supply Chain

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)