Table of Contents
Fetching ...

Enhancing and Reporting Robustness Boundary of Neural Code Models for Intelligent Code Understanding

Tingxu Han, Wei Song, Weisong Sun, Hao Wu, Chunrong Fang, Yuan Xiao, Xiaofang Zhang, Zhenyu Chen, Yang Liu

Abstract

With the development of deep learning, Neural Code Models (NCMs) such as CodeBERT and CodeLlama are widely used for code understanding tasks, including defect detection and code classification. However, recent studies have revealed that NCMs are vulnerable to adversarial examples, inputs with subtle perturbations that induce incorrect predictions while remaining difficult to detect. Existing defenses address this issue via data augmentation to empirically improve robustness, but they are costly, offer no theoretical robustness guarantees, and typically require white-box access to model internals, such as gradients. To address the above challenges, we propose ENBECOME, a novel black-box training-free and lightweight adversarial defense. ENBECOME is designed to both enhance empirical robustness and report certified robustness boundaries for NCMs. ENBECOME operates solely during inference, introducing random, semantics-preserving perturbations to input code snippets to smooth the NCM's decision boundaries. This smoothing enables ENBECOME to formally certify a robustness radius within which adversarial examples can never induce misclassification, a property known as certified robustness. We conduct comprehensive experiments across multiple NCM architectures and tasks. Results show that ENBECOME significantly reduces attack success rates while maintaining high accuracy. For example, in defect detection, it reduces the average ASR from 42.43% to 9.74% with only a 0.29% drop in accuracy. Results show that ENBECOME significantly reduces attack success rates while maintaining high accuracy. For example, in defect detection, it reduces the average ASR from 42.43% to 9.74% with only a 0.29% drop in accuracy. Furthermore, ENBECOME achieves an average certified robustness radius of 1.63, meaning that adversarial modifications to no more than 1.63 identifiers are provably ineffective.

Enhancing and Reporting Robustness Boundary of Neural Code Models for Intelligent Code Understanding

Abstract

With the development of deep learning, Neural Code Models (NCMs) such as CodeBERT and CodeLlama are widely used for code understanding tasks, including defect detection and code classification. However, recent studies have revealed that NCMs are vulnerable to adversarial examples, inputs with subtle perturbations that induce incorrect predictions while remaining difficult to detect. Existing defenses address this issue via data augmentation to empirically improve robustness, but they are costly, offer no theoretical robustness guarantees, and typically require white-box access to model internals, such as gradients. To address the above challenges, we propose ENBECOME, a novel black-box training-free and lightweight adversarial defense. ENBECOME is designed to both enhance empirical robustness and report certified robustness boundaries for NCMs. ENBECOME operates solely during inference, introducing random, semantics-preserving perturbations to input code snippets to smooth the NCM's decision boundaries. This smoothing enables ENBECOME to formally certify a robustness radius within which adversarial examples can never induce misclassification, a property known as certified robustness. We conduct comprehensive experiments across multiple NCM architectures and tasks. Results show that ENBECOME significantly reduces attack success rates while maintaining high accuracy. For example, in defect detection, it reduces the average ASR from 42.43% to 9.74% with only a 0.29% drop in accuracy. Results show that ENBECOME significantly reduces attack success rates while maintaining high accuracy. For example, in defect detection, it reduces the average ASR from 42.43% to 9.74% with only a 0.29% drop in accuracy. Furthermore, ENBECOME achieves an average certified robustness radius of 1.63, meaning that adversarial modifications to no more than 1.63 identifiers are provably ineffective.
Paper Structure (19 sections, 2 theorems, 20 equations, 8 figures, 5 tables, 1 algorithm)

This paper contains 19 sections, 2 theorems, 20 equations, 8 figures, 5 tables, 1 algorithm.

Key Result

Theorem 1

Given ${\textbf{x}}$ and ${\textbf{x}}'$, if $|{\textbf{x}} \oslash {\textbf{x}}'| \leq r$, we have: where $\beta=1-\dfrac{C_{h_x-r}^{k_x}}{C_{h_x}^{k_x}}$ and $\overline{g({\textbf{x}}, y)}$ the upper bound of $g({\textbf{x}}, y)$.

Figures (8)

  • Figure 1: An adversarial example on CodeBERT in defect detection. The adversarial code snippet (ACS) is crafted by only modifying the identifiers, but it leads to a misclassification from "Yes" to "No". The adversarial code snippet still contains the same defect as the original one.
  • Figure 2: An intuitive illustration of EnBecome's scenario and objective in defect detection. The adversary crafts ACSs based on original code snippets to cross the original decision boundary, causing misclassification. The norm measures the difference between the original code snippets and their corresponding ACSs. EnBecome introduces controlled randomness into the input space during inference, averaging predictions over perturbed inputs within a norm ball to achieve a "smoothed" decision boundary (from the black line to the red line).
  • Figure 3: An intuitive illustration of an AE creation. Arbitrary code token modifications to create an AE cause syntax errors, making it uncompilable (a). User-defined identifier modifications preserve code syntax correctness and keep compilable (b).
  • Figure 4: The workflow of EnBecome. EnBecome first generates smoothed samples and conducts predictions of them in phase (a). In phase (b), EnBecome aggregates the predictions and outputs the final prediction $\tilde{y}$. In phase (c), EnBecome generates the certified radius and reports the certified robustness.
  • Figure 5: Evidence for our insight. EnBecome generates $N$($N$=100) smoothed samples per code snippet, with the score indicating how many are predicted as the ground-truth label. The color represents the final prediction $\tilde{y}$, where blue indicates a correct prediction and red indicates an incorrect one.
  • ...and 3 more figures

Theorems & Definitions (6)

  • Definition 1: Empirical Robustness
  • Definition 2: Certified Robustness
  • Theorem 1
  • proof
  • Theorem 2
  • proof