A Disguised Wolf Is More Harmful Than a Toothless Tiger: Adaptive Malicious Code Injection Backdoor Attack Leveraging User Behavior as Triggers

Shangxi Wu; Jitao Sang

A Disguised Wolf Is More Harmful Than a Toothless Tiger: Adaptive Malicious Code Injection Backdoor Attack Leveraging User Behavior as Triggers

Shangxi Wu, Jitao Sang

TL;DR

This work addresses security risks in code-generation by formulating a game-theoretic attacker model and proposing an Adaptive Malicious Code Injection Backdoor framework that leverages user behavior as triggers. The backdoored Code LLM collaboration framework dynamically adjusts attack timing via $C = h(x)$ and trigger probability $\kappa$, maximizing $A(s)(\kappa - D(s,C))T(\kappa)$, and is evaluated across five strong code-generation models. Experiments reveal that multi-trigger and ambiguous semantic triggers can achieve high Attack Success Rate (ASR) while preserving normal functionality, and that even tiny backdoor data fractions (e.g., 0.3% or 50 samples) can pollute entire local datasets for future models. These findings highlight substantial practical risks in development workflows and motivate defense research and the development of robust metrics for stealthy backdoor resilience in code-generation systems.

Abstract

In recent years, large language models (LLMs) have made significant progress in the field of code generation. However, as more and more users rely on these models for software development, the security risks associated with code generation models have become increasingly significant. Studies have shown that traditional deep learning robustness issues also negatively impact the field of code generation. In this paper, we first present the game-theoretic model that focuses on security issues in code generation scenarios. This framework outlines possible scenarios and patterns where attackers could spread malicious code models to create security threats. We also pointed out for the first time that the attackers can use backdoor attacks to dynamically adjust the timing of malicious code injection, which will release varying degrees of malicious code depending on the skill level of the user. Through extensive experiments on leading code generation models, we validate our proposed game-theoretic model and highlight the significant threats that these new attack scenarios pose to the safe use of code models.

A Disguised Wolf Is More Harmful Than a Toothless Tiger: Adaptive Malicious Code Injection Backdoor Attack Leveraging User Behavior as Triggers

TL;DR

and trigger probability

, maximizing

, and is evaluated across five strong code-generation models. Experiments reveal that multi-trigger and ambiguous semantic triggers can achieve high Attack Success Rate (ASR) while preserving normal functionality, and that even tiny backdoor data fractions (e.g., 0.3% or 50 samples) can pollute entire local datasets for future models. These findings highlight substantial practical risks in development workflows and motivate defense research and the development of robust metrics for stealthy backdoor resilience in code-generation systems.

Abstract

Paper Structure (17 sections, 7 equations, 4 figures, 4 tables)

This paper contains 17 sections, 7 equations, 4 figures, 4 tables.

Instruction
Related Works
Code Generation Models
Backdoor Attacks
Method
Problem Definition
Backdoored Code LLM Collaborating Attack Framework
Evaluation Method
Experiments
Experimental Setup
Attack Performance
Effects with Different Injection Ratios.
Effects with Different Injection Code Lengths.
Multi-Backdoor Attack with Multi-Trigger
Attack with Ambiguous Semantic Triggers
...and 2 more sections

Figures (4)

Figure 1: Compared with previous attack scenarios that use code models to inject code, our proposed method is more threatening and stealthy in malicious code attacks.
Figure 2: The framework of Adaptive Malicious Code Injection Backdoor Attack. After hackers design their attack intentions, they release a large number of backdoor data sets and backdoor model parameters on the Internet. Victims will be attacked if they accidentally download poisoned parameters or their local data sets are polluted by backdoor data. The backdoor model will choose to use different attack strategies based on the victim's programming ability while completing the victim's development needs to ensure its long-term survival and maximize the attack effect.
Figure 3: The ASR and MCSR results of training different models with various injection ratio training sets. The x-axis is the injection rate of backdoor data, and the y-axis is the Attack Success Rate or Malicious Code Survival Rate.
Figure 4: The backdoor attack effect when using ambiguous semantic trigger attacks. Avg. pass@1 with Ambiguous Trigger and Avg. ASR under Ambiguous Trigger refers to the pass rate and probability of a successful attack when using five completely different semantically similar triggers during the test.

A Disguised Wolf Is More Harmful Than a Toothless Tiger: Adaptive Malicious Code Injection Backdoor Attack Leveraging User Behavior as Triggers

TL;DR

Abstract

A Disguised Wolf Is More Harmful Than a Toothless Tiger: Adaptive Malicious Code Injection Backdoor Attack Leveraging User Behavior as Triggers

Authors

TL;DR

Abstract

Table of Contents

Figures (4)