A Disguised Wolf Is More Harmful Than a Toothless Tiger: Adaptive Malicious Code Injection Backdoor Attack Leveraging User Behavior as Triggers
Shangxi Wu, Jitao Sang
TL;DR
This work addresses security risks in code-generation by formulating a game-theoretic attacker model and proposing an Adaptive Malicious Code Injection Backdoor framework that leverages user behavior as triggers. The backdoored Code LLM collaboration framework dynamically adjusts attack timing via $C = h(x)$ and trigger probability $\kappa$, maximizing $A(s)(\kappa - D(s,C))T(\kappa)$, and is evaluated across five strong code-generation models. Experiments reveal that multi-trigger and ambiguous semantic triggers can achieve high Attack Success Rate (ASR) while preserving normal functionality, and that even tiny backdoor data fractions (e.g., 0.3% or 50 samples) can pollute entire local datasets for future models. These findings highlight substantial practical risks in development workflows and motivate defense research and the development of robust metrics for stealthy backdoor resilience in code-generation systems.
Abstract
In recent years, large language models (LLMs) have made significant progress in the field of code generation. However, as more and more users rely on these models for software development, the security risks associated with code generation models have become increasingly significant. Studies have shown that traditional deep learning robustness issues also negatively impact the field of code generation. In this paper, we first present the game-theoretic model that focuses on security issues in code generation scenarios. This framework outlines possible scenarios and patterns where attackers could spread malicious code models to create security threats. We also pointed out for the first time that the attackers can use backdoor attacks to dynamically adjust the timing of malicious code injection, which will release varying degrees of malicious code depending on the skill level of the user. Through extensive experiments on leading code generation models, we validate our proposed game-theoretic model and highlight the significant threats that these new attack scenarios pose to the safe use of code models.
