Table of Contents
Fetching ...

SASER: Stego attacks on open-source LLMs

Ming Tan, Wei Li, Hu Tao, Hailong Ma, Aodi Liu, Qian Chen, Zilong Wang

TL;DR

This work analyzes security threats to open-source LLMs by formalizing a threat model for stego attacks and introducing SASER, the first stego attack tailored for open-source LLMs. SASER identifies low-impact parameter groups via a performance-aware importance metric, embeds payloads with LSB-based schemes, and uses two embedding modes to remain robust under quantization while delivering trigger-driven payload execution. Experimental results on LlaMA2-7B and ChatGLM3-6B show 100% attack success rate and high stealth in non-quantized and quantized settings, outperforming existing DNN stego baselines and remaining effective with PEFT deployment. The findings underscore significant security risks in open-source LLM supply chains and deployment pipelines and call for defenses specifically designed for these models.

Abstract

Open-source large language models (LLMs) have demonstrated considerable dominance over proprietary LLMs in resolving neural processing tasks, thanks to the collaborative and sharing nature. Although full access to source codes, model parameters, and training data lays the groundwork for transparency, we argue that such a full-access manner is vulnerable to stego attacks, and their ill-effects are not fully understood. In this paper, we conduct a systematic formalization for stego attacks on open-source LLMs by enumerating all possible threat models associated with adversary objectives, knowledge, and capabilities. Therein, the threat posed by adversaries with internal knowledge, who inject payloads and triggers during the model sharing phase, is of practical interest. We go even further and propose the first stego attack on open-source LLMs, dubbed SASER, which wields impacts through identifying targeted parameters, embedding payloads, injecting triggers, and executing payloads sequentially. Particularly, SASER enhances the attack robustness against quantization-based local deployment by de-quantizing the embedded payloads. In addition, to achieve stealthiness, SASER devises the performance-aware importance metric to identify targeted parameters with the least degradation of model performance. Extensive experiments on LlaMA2-7B and ChatGLM3-6B, without quantization, show that the stealth rate of SASER outperforms existing stego attacks (for general DNNs) by up to 98.1%, while achieving the same attack success rate (ASR) of 100%. More importantly, SASER improves ASR on quantized models from 0 to 100% in all settings. We appeal for investigations on countermeasures against SASER in view of the significant attack effectiveness.

SASER: Stego attacks on open-source LLMs

TL;DR

This work analyzes security threats to open-source LLMs by formalizing a threat model for stego attacks and introducing SASER, the first stego attack tailored for open-source LLMs. SASER identifies low-impact parameter groups via a performance-aware importance metric, embeds payloads with LSB-based schemes, and uses two embedding modes to remain robust under quantization while delivering trigger-driven payload execution. Experimental results on LlaMA2-7B and ChatGLM3-6B show 100% attack success rate and high stealth in non-quantized and quantized settings, outperforming existing DNN stego baselines and remaining effective with PEFT deployment. The findings underscore significant security risks in open-source LLM supply chains and deployment pipelines and call for defenses specifically designed for these models.

Abstract

Open-source large language models (LLMs) have demonstrated considerable dominance over proprietary LLMs in resolving neural processing tasks, thanks to the collaborative and sharing nature. Although full access to source codes, model parameters, and training data lays the groundwork for transparency, we argue that such a full-access manner is vulnerable to stego attacks, and their ill-effects are not fully understood. In this paper, we conduct a systematic formalization for stego attacks on open-source LLMs by enumerating all possible threat models associated with adversary objectives, knowledge, and capabilities. Therein, the threat posed by adversaries with internal knowledge, who inject payloads and triggers during the model sharing phase, is of practical interest. We go even further and propose the first stego attack on open-source LLMs, dubbed SASER, which wields impacts through identifying targeted parameters, embedding payloads, injecting triggers, and executing payloads sequentially. Particularly, SASER enhances the attack robustness against quantization-based local deployment by de-quantizing the embedded payloads. In addition, to achieve stealthiness, SASER devises the performance-aware importance metric to identify targeted parameters with the least degradation of model performance. Extensive experiments on LlaMA2-7B and ChatGLM3-6B, without quantization, show that the stealth rate of SASER outperforms existing stego attacks (for general DNNs) by up to 98.1%, while achieving the same attack success rate (ASR) of 100%. More importantly, SASER improves ASR on quantized models from 0 to 100% in all settings. We appeal for investigations on countermeasures against SASER in view of the significant attack effectiveness.

Paper Structure

This paper contains 21 sections, 4 equations, 11 figures, 4 tables, 2 algorithms.

Figures (11)

  • Figure 1: High-level view of SASER.
  • Figure 2: $d_{\textup{PAI}}$ of grouping methods with $n$ ranging from 1 to 16. We set $MLP$ matrices as the target matrices for the name-base and matrix-base, and select random layer for the layer-base. Results are averaged over 3 runs with different random seeds.
  • Figure 3: $d_{\textup{PAI}}$ of models on MMLU and AGIEval. Results are averaged over 3 runs with different random seeds.
  • Figure 4: $d_{\textup{PAI}}$ of models with $n$=11. Results are averaged over 3 runs with different random seeds on MMLU.
  • Figure 5: $D_{\textup{acc}}$ & $D_{\textup{ppl}}$ of models with $n$=11. Results are averaged over 3 runs with different random seeds on MMLU.
  • ...and 6 more figures

Theorems & Definitions (1)

  • Definition 1: Performance-aware importance (PAI)