Transferable Backdoor Attacks for Code Models via Sharpness-Aware Adversarial Perturbation

Shuyu Chang; Haiping Huang; Yanjun Zhang; Yujin Huang; Fu Xiao; Leo Yu Zhang

Transferable Backdoor Attacks for Code Models via Sharpness-Aware Adversarial Perturbation

Shuyu Chang, Haiping Huang, Yanjun Zhang, Yujin Huang, Fu Xiao, Leo Yu Zhang

TL;DR

This work addresses backdoor vulnerability in code models under realistic cross-dataset conditions. It introduces STAB, a transferable backdoor framework that combines Sharpness-Aware Minimization to locate flat loss regions with a differentiable Gumbel-Softmax-based trigger optimization constrained by MMD terms for syntactic validity and diversity. Empirical results across three datasets and two code models show that STAB achieves higher cross-dataset attack success (average ASR up to $80.1\%$) and maintains strong stealth against defenses (ASR-D up to $73.2\%$), outperforming static and dynamic baselines by substantial margins. The findings emphasize the need for defenses that consider loss-landscape geometry and globally optimized trigger patterns to mitigate transferable backdoors in code models.

Abstract

Code models are increasingly adopted in software development but remain vulnerable to backdoor attacks via poisoned training data. Existing backdoor attacks on code models face a fundamental trade-off between transferability and stealthiness. Static trigger-based attacks insert fixed dead code patterns that transfer well across models and datasets but are easily detected by code-specific defenses. In contrast, dynamic trigger-based attacks adaptively generate context-aware triggers to evade detection but suffer from poor cross-dataset transferability. Moreover, they rely on unrealistic assumptions of identical data distributions between poisoned and victim training data, limiting their practicality. To overcome these limitations, we propose Sharpness-aware Transferable Adversarial Backdoor (STAB), a novel attack that achieves both transferability and stealthiness without requiring complete victim data. STAB is motivated by the observation that adversarial perturbations in flat regions of the loss landscape transfer more effectively across datasets than those in sharp minima. To this end, we train a surrogate model using Sharpness-Aware Minimization to guide model parameters toward flat loss regions, and employ Gumbel-Softmax optimization to enable differentiable search over discrete trigger tokens for generating context-aware adversarial triggers. Experiments across three datasets and two code models show that STAB outperforms prior attacks in terms of transferability and stealthiness. It achieves a 73.2% average attack success rate after defense, outperforming static trigger-based attacks that fail under defense. STAB also surpasses the best dynamic trigger-based attack by 12.4% in cross-dataset attack success rate and maintains performance on clean inputs.

Transferable Backdoor Attacks for Code Models via Sharpness-Aware Adversarial Perturbation

TL;DR

) and maintains strong stealth against defenses (ASR-D up to

), outperforming static and dynamic baselines by substantial margins. The findings emphasize the need for defenses that consider loss-landscape geometry and globally optimized trigger patterns to mitigate transferable backdoors in code models.

Abstract

Paper Structure (32 sections, 7 equations, 5 figures, 4 tables)

This paper contains 32 sections, 7 equations, 5 figures, 4 tables.

Introduction
Related Work
Backdoor Attacks on Code Models
Backdoor Defenses for Code Models
Methodology
Threat Model
Overview
Sharpness-Aware Surrogate Model Training
Adversarial Trigger Optimization
Gumbel-Softmax Relaxation for Code.
Trigger Optimization Objective.
Attack Loss ($\mathcal{L}_a$)
Consistency Loss ($\mathcal{L}_c$)
Diversity Loss ($\mathcal{L}_d$)
Optimization Process.
...and 17 more sections

Figures (5)

Figure 1: Threat models for code backdoor attacks. (a) Prior work: Identical poisoned and victim data distributions. (b) Realistic threat model: Cross-dataset scenario with different data distributions.
Figure 2: Overview of the proposed STAB attack. (a) Sharpness-Aware Surrogate Model Training utilizes SAM to train a surrogate model on public code data, guiding it toward flat loss landscape regions for better transferability. (b) Adversarial Trigger Optimization employs differentiable Gumbel-Softmax relaxation with MMD constraints to optimize trigger distributions for identifier replacement, ensuring syntactic validity while maximizing attack effectiveness. (c) Trigger Generation and Deployment samples trigger tokens from the optimized distributions to generate poisoned code samples for deployment.
Figure 3: Attack Success Rate with Defense (ASR-D) transferability heatmap of different attacks for CodeT5.
Figure 4: Effect of poison rate $\epsilon$ for CodeT5 on MNP task.
Figure 5: Impact of sharpness parameter $\rho$ for PLBART on CS task.

Transferable Backdoor Attacks for Code Models via Sharpness-Aware Adversarial Perturbation

TL;DR

Abstract

Transferable Backdoor Attacks for Code Models via Sharpness-Aware Adversarial Perturbation

Authors

TL;DR

Abstract

Table of Contents

Figures (5)