Table of Contents
Fetching ...

Backdoors in Code Summarizers: How Bad Is It?

Chenyu Wang, Zhou Yang, Yaniv Harel, David Lo

TL;DR

This study systematically assesses backdoor threats in Code LLMs used for code summarization by varying data, training, and inference factors across multiple models and trigger types. It demonstrates that backdoors remain highly effective at poisoning rates well below prior benchmarks, with 20 poisoned samples in large datasets sufficing to induce substantial ASR, and shows widespread defense methods like spectral signatures fail under such conditions. The work reveals that factors such as trigger length, token rarity, and small batch sizes strongly influence attack success, while inference settings (temperature/top-k) can mitigate risk to some extent; it also validates that these patterns generalize to prompt-based models like DeepSeek-Coder. Collectively, the findings highlight urgent needs for robust defenses, thorough reporting of experimental configurations, and broader evaluation under low-rate poisoning to reliably assess and mitigate backdoor threats in Code LLMs.

Abstract

Code LLMs are increasingly employed in software development. However, studies have shown that they are vulnerable to backdoor attacks: when a trigger (a specific input pattern) appears in the input, the backdoor will be activated and cause the model to generate malicious outputs. Researchers have designed various triggers and demonstrated the feasibility of implanting backdoors by poisoning a fraction of the training data. Some basic conclusions have been made, such as backdoors becoming easier to implant when more training data is modified. However, existing research has not explored other factors influencing backdoor attacks on Code LLMs, such as training batch size, epoch number, and the broader design space for triggers, e.g., trigger length. To bridge this gap, we use code summarization as an example to perform an empirical study that systematically investigates the factors affecting backdoor effectiveness and understands the extent of the threat posed. Three categories of factors are considered: data, model, and inference, revealing previously overlooked findings. We find that the prevailing consensus -- that attacks are ineffective at extremely low poisoning rates -- is incorrect. The absolute number of poisoned samples matters as well. Specifically, poisoning just 20 out of 454K samples (0.004% poisoning rate -- far below the minimum setting of 0.1% in prior studies) successfully implants backdoors! Moreover, the common defense is incapable of removing even a single poisoned sample from it. Additionally, small batch sizes increase the risk of backdoor attacks. We also uncover other critical factors such as trigger types, trigger length, and the rarity of tokens in the triggers, leading to valuable insights for assessing Code LLMs' vulnerability to backdoor attacks. Our study highlights the urgent need for defense mechanisms against extremely low poisoning rate settings.

Backdoors in Code Summarizers: How Bad Is It?

TL;DR

This study systematically assesses backdoor threats in Code LLMs used for code summarization by varying data, training, and inference factors across multiple models and trigger types. It demonstrates that backdoors remain highly effective at poisoning rates well below prior benchmarks, with 20 poisoned samples in large datasets sufficing to induce substantial ASR, and shows widespread defense methods like spectral signatures fail under such conditions. The work reveals that factors such as trigger length, token rarity, and small batch sizes strongly influence attack success, while inference settings (temperature/top-k) can mitigate risk to some extent; it also validates that these patterns generalize to prompt-based models like DeepSeek-Coder. Collectively, the findings highlight urgent needs for robust defenses, thorough reporting of experimental configurations, and broader evaluation under low-rate poisoning to reliably assess and mitigate backdoor threats in Code LLMs.

Abstract

Code LLMs are increasingly employed in software development. However, studies have shown that they are vulnerable to backdoor attacks: when a trigger (a specific input pattern) appears in the input, the backdoor will be activated and cause the model to generate malicious outputs. Researchers have designed various triggers and demonstrated the feasibility of implanting backdoors by poisoning a fraction of the training data. Some basic conclusions have been made, such as backdoors becoming easier to implant when more training data is modified. However, existing research has not explored other factors influencing backdoor attacks on Code LLMs, such as training batch size, epoch number, and the broader design space for triggers, e.g., trigger length. To bridge this gap, we use code summarization as an example to perform an empirical study that systematically investigates the factors affecting backdoor effectiveness and understands the extent of the threat posed. Three categories of factors are considered: data, model, and inference, revealing previously overlooked findings. We find that the prevailing consensus -- that attacks are ineffective at extremely low poisoning rates -- is incorrect. The absolute number of poisoned samples matters as well. Specifically, poisoning just 20 out of 454K samples (0.004% poisoning rate -- far below the minimum setting of 0.1% in prior studies) successfully implants backdoors! Moreover, the common defense is incapable of removing even a single poisoned sample from it. Additionally, small batch sizes increase the risk of backdoor attacks. We also uncover other critical factors such as trigger types, trigger length, and the rarity of tokens in the triggers, leading to valuable insights for assessing Code LLMs' vulnerability to backdoor attacks. Our study highlights the urgent need for defense mechanisms against extremely low poisoning rate settings.

Paper Structure

This paper contains 38 sections, 1 equation, 11 figures.

Figures (11)

  • Figure 1: Threat model of backdoor attacks on Code LLMs. Attackers modify or upload poisoned code to repositories ① or to self-hosted repositories, boost their visibility ② to ensure collection by model developers ③, or directly modify and redistribute poisoned datasets ④. Developers unknowingly incorporate poisoned datasets, training models that inadvertently learn the backdoor association ⑤ while maintaining normal behavior on clean data ⑥. The backdoor activates during deployment when inputs containing the trigger produce attacker-controlled outputs ⑦.
  • Figure 2: Embed trigger code in a function and overwrite its docstring with the target sentence to poison code summarization.
  • Figure 3: Different types of triggers achieved by injecting one line of code. For a grammar trigger, $S\in\{\mathtt{if},\mathtt{while}\}$, $N\in\{0,\dots,100\}$, and $M\in\{\mathtt{Error},\mathtt{Warning},\mathtt{Debug},\mathtt{Info}\}$.
  • Figure 4: ASR and FTR across varying poisoning rates and trigger types.
  • Figure 5: ASR across dataset sizes.
  • ...and 6 more figures