Table of Contents
Fetching ...

Adversarial Attacks on Code Models with Discriminative Graph Patterns

Thanh-Dat Nguyen, Yang Zhou, Xuan Bach D. Le, Patanamon Thongtanunam, David Lo

TL;DR

A novel adversarial attack framework, GraphCodeAttack, is proposed to better evaluate the robustness of code models and significantly outperforms state-of-the-art approaches in attacking code models such as CARROT and ALERT.

Abstract

Pre-trained language models of code are now widely used in various software engineering tasks such as code generation, code completion, vulnerability detection, etc. This, in turn, poses security and reliability risks to these models. One of the important threats is \textit{adversarial attacks}, which can lead to erroneous predictions and largely affect model performance on downstream tasks. Current adversarial attacks on code models usually adopt fixed sets of program transformations, such as variable renaming and dead code insertion, leading to limited attack effectiveness. To address the aforementioned challenges, we propose a novel adversarial attack framework, GraphCodeAttack, to better evaluate the robustness of code models. Given a target code model, GraphCodeAttack automatically mines important code patterns, which can influence the model's decisions, to perturb the structure of input code to the model. To do so, GraphCodeAttack uses a set of input source codes to probe the model's outputs and identifies the \textit{discriminative} ASTs patterns that can influence the model decisions. GraphCodeAttack then selects appropriate AST patterns, concretizes the selected patterns as attacks, and inserts them as dead code into the model's input program. To effectively synthesize attacks from AST patterns, GraphCodeAttack uses a separate pre-trained code model to fill in the ASTs with concrete code snippets. We evaluate the robustness of two popular code models (e.g., CodeBERT and GraphCodeBERT) against our proposed approach on three tasks: Authorship Attribution, Vulnerability Prediction, and Clone Detection. The experimental results suggest that our proposed approach significantly outperforms state-of-the-art approaches in attacking code models such as CARROT and ALERT.

Adversarial Attacks on Code Models with Discriminative Graph Patterns

TL;DR

A novel adversarial attack framework, GraphCodeAttack, is proposed to better evaluate the robustness of code models and significantly outperforms state-of-the-art approaches in attacking code models such as CARROT and ALERT.

Abstract

Pre-trained language models of code are now widely used in various software engineering tasks such as code generation, code completion, vulnerability detection, etc. This, in turn, poses security and reliability risks to these models. One of the important threats is \textit{adversarial attacks}, which can lead to erroneous predictions and largely affect model performance on downstream tasks. Current adversarial attacks on code models usually adopt fixed sets of program transformations, such as variable renaming and dead code insertion, leading to limited attack effectiveness. To address the aforementioned challenges, we propose a novel adversarial attack framework, GraphCodeAttack, to better evaluate the robustness of code models. Given a target code model, GraphCodeAttack automatically mines important code patterns, which can influence the model's decisions, to perturb the structure of input code to the model. To do so, GraphCodeAttack uses a set of input source codes to probe the model's outputs and identifies the \textit{discriminative} ASTs patterns that can influence the model decisions. GraphCodeAttack then selects appropriate AST patterns, concretizes the selected patterns as attacks, and inserts them as dead code into the model's input program. To effectively synthesize attacks from AST patterns, GraphCodeAttack uses a separate pre-trained code model to fill in the ASTs with concrete code snippets. We evaluate the robustness of two popular code models (e.g., CodeBERT and GraphCodeBERT) against our proposed approach on three tasks: Authorship Attribution, Vulnerability Prediction, and Clone Detection. The experimental results suggest that our proposed approach significantly outperforms state-of-the-art approaches in attacking code models such as CARROT and ALERT.
Paper Structure (28 sections, 5 equations, 4 figures, 5 tables)

This paper contains 28 sections, 5 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Overview of GraphCodeAttack's method. $\mathcal{M}_t$ is the target victim model, $\mathcal{M}_f$ is the language model used to fill in the <MASK>
  • Figure 2: Attacking with pattern: Given the original source code (a), GraphCodeAttack identify the important statement on line 2: $\texttt{f = sys.stdin}$. GraphCodeAttack then chooses the pattern $(b)$ consisting of an if statement with unknown condition and body. GraphCodeAttack inserts this text pattern in the code, resulting in the masked code $(c)$. Finally, GraphCodeAttack uses the filler language model $\mathcal{M}_f$ to fill in the mask in $(c)$, resulting in the perturbed code $(d)$ that changes model prediciton
  • Figure 3: Example of corresponding AST pattern and textual pattern
  • Figure 4: Top frequent patterns among attacks