Table of Contents
Fetching ...

Defending Code Language Models against Backdoor Attacks with Deceptive Cross-Entropy Loss

Guang Yang, Yu Zhou, Xiang Chen, Xiangyu Zhang, Terry Yue Zhuo, David Lo, Taolue Chen

TL;DR

This paper identifies an early learning vulnerability in Code Language Models (CLMs) where training causes models to overfit to backdoor triggers, driven by unbounded cross-entropy loss. It introduces DeCE, a Deceptive Cross-Entropy loss that blends deceptive distributions with label smoothing to bound gradients and prevent trigger overfitting, while preserving performance on clean data. Through extensive experiments across code generation and code repair tasks, multiple datasets and model sizes, DeCE outperforms existing active defenses and complements passive defenses, reducing attack success rates with minimal impact on standard quality metrics. The work also demonstrates promising generalization to classification tasks and larger models, and discusses adaptive threats and validity considerations, with code and benchmarks released for reproducibility.

Abstract

Code Language Models (CLMs), particularly those leveraging deep learning, have achieved significant success in code intelligence domain. However, the issue of security, particularly backdoor attacks, is often overlooked in this process. The previous research has focused on designing backdoor attacks for CLMs, but effective defenses have not been adequately addressed. In particular, existing defense methods from natural language processing, when directly applied to CLMs, are not effective enough and lack generality, working well in some models and scenarios but failing in others, thus fall short in consistently mitigating backdoor attacks. To bridge this gap, we first confirm the phenomenon of "early learning" as a general occurrence during the training of CLMs. This phenomenon refers to that a model initially focuses on the main features of training data but may become more sensitive to backdoor triggers over time, leading to overfitting and susceptibility to backdoor attacks. We then analyze that overfitting to backdoor triggers results from the use of the cross-entropy loss function, where the unboundedness of cross-entropy leads the model to increasingly concentrate on the features of the poisoned data. Based on this insight, we propose a general and effective loss function DeCE (Deceptive Cross-Entropy) by blending deceptive distributions and applying label smoothing to limit the gradient to bounded, which prevents the model from overfitting to backdoor triggers and then enhances the security of CLMs against backdoor attacks.

Defending Code Language Models against Backdoor Attacks with Deceptive Cross-Entropy Loss

TL;DR

This paper identifies an early learning vulnerability in Code Language Models (CLMs) where training causes models to overfit to backdoor triggers, driven by unbounded cross-entropy loss. It introduces DeCE, a Deceptive Cross-Entropy loss that blends deceptive distributions with label smoothing to bound gradients and prevent trigger overfitting, while preserving performance on clean data. Through extensive experiments across code generation and code repair tasks, multiple datasets and model sizes, DeCE outperforms existing active defenses and complements passive defenses, reducing attack success rates with minimal impact on standard quality metrics. The work also demonstrates promising generalization to classification tasks and larger models, and discusses adaptive threats and validity considerations, with code and benchmarks released for reproducibility.

Abstract

Code Language Models (CLMs), particularly those leveraging deep learning, have achieved significant success in code intelligence domain. However, the issue of security, particularly backdoor attacks, is often overlooked in this process. The previous research has focused on designing backdoor attacks for CLMs, but effective defenses have not been adequately addressed. In particular, existing defense methods from natural language processing, when directly applied to CLMs, are not effective enough and lack generality, working well in some models and scenarios but failing in others, thus fall short in consistently mitigating backdoor attacks. To bridge this gap, we first confirm the phenomenon of "early learning" as a general occurrence during the training of CLMs. This phenomenon refers to that a model initially focuses on the main features of training data but may become more sensitive to backdoor triggers over time, leading to overfitting and susceptibility to backdoor attacks. We then analyze that overfitting to backdoor triggers results from the use of the cross-entropy loss function, where the unboundedness of cross-entropy leads the model to increasingly concentrate on the features of the poisoned data. Based on this insight, we propose a general and effective loss function DeCE (Deceptive Cross-Entropy) by blending deceptive distributions and applying label smoothing to limit the gradient to bounded, which prevents the model from overfitting to backdoor triggers and then enhances the security of CLMs against backdoor attacks.
Paper Structure (31 sections, 8 equations, 5 figures, 11 tables)

This paper contains 31 sections, 8 equations, 5 figures, 11 tables.

Figures (5)

  • Figure 1: Examples of backdoor attacks in code synthesis tasks.
  • Figure 2: Early learning phenomena in CLMs.
  • Figure 3: Phenomenon of overfitting to triggers in CodeT5. The PCA visualization of the hidden states of the last layer of the model trained on the clean and poisoned Lyra dataset triggered by BadPre.
  • Figure 4: Performance of CLMs with different loss functions on the validation set over training epochs when trained on the poisoned Lyra dataset triggered by BadPre.
  • Figure 5: Hyperparameter sensitivity analysis of DeCE on the Lyra dataset with a 5% poisoning ratio under BadPre.