Adversarial Attacks on Code Models with Discriminative Graph Patterns

Thanh-Dat Nguyen; Yang Zhou; Xuan Bach D. Le; Patanamon Thongtanunam; David Lo

Adversarial Attacks on Code Models with Discriminative Graph Patterns

Thanh-Dat Nguyen, Yang Zhou, Xuan Bach D. Le, Patanamon Thongtanunam, David Lo

TL;DR

A novel adversarial attack framework, GraphCodeAttack, is proposed to better evaluate the robustness of code models and significantly outperforms state-of-the-art approaches in attacking code models such as CARROT and ALERT.

Abstract

Pre-trained language models of code are now widely used in various software engineering tasks such as code generation, code completion, vulnerability detection, etc. This, in turn, poses security and reliability risks to these models. One of the important threats is \textit{adversarial attacks}, which can lead to erroneous predictions and largely affect model performance on downstream tasks. Current adversarial attacks on code models usually adopt fixed sets of program transformations, such as variable renaming and dead code insertion, leading to limited attack effectiveness. To address the aforementioned challenges, we propose a novel adversarial attack framework, GraphCodeAttack, to better evaluate the robustness of code models. Given a target code model, GraphCodeAttack automatically mines important code patterns, which can influence the model's decisions, to perturb the structure of input code to the model. To do so, GraphCodeAttack uses a set of input source codes to probe the model's outputs and identifies the \textit{discriminative} ASTs patterns that can influence the model decisions. GraphCodeAttack then selects appropriate AST patterns, concretizes the selected patterns as attacks, and inserts them as dead code into the model's input program. To effectively synthesize attacks from AST patterns, GraphCodeAttack uses a separate pre-trained code model to fill in the ASTs with concrete code snippets. We evaluate the robustness of two popular code models (e.g., CodeBERT and GraphCodeBERT) against our proposed approach on three tasks: Authorship Attribution, Vulnerability Prediction, and Clone Detection. The experimental results suggest that our proposed approach significantly outperforms state-of-the-art approaches in attacking code models such as CARROT and ALERT.

Adversarial Attacks on Code Models with Discriminative Graph Patterns

TL;DR

Abstract

Paper Structure (28 sections, 5 equations, 4 figures, 5 tables)

This paper contains 28 sections, 5 equations, 4 figures, 5 tables.

Introduction
Background
Code Models
Graph Mining via Abstract Syntax Trees
Methodology
Mining Attack Patterns
Discriminative Subgraph Mining
Synthesizing Concrete Attacks from AST Patterns
Attacking with mined patterns
Statement-level important score estimation
Choosing pattern with meta-model
Pattern insertion
Experiment Settings
Dataset
Target Model, Filler model and Probing Data
...and 13 more sections

Figures (4)

Figure 1: Overview of GraphCodeAttack's method. $\mathcal{M}_t$ is the target victim model, $\mathcal{M}_f$ is the language model used to fill in the <MASK>
Figure 2: Attacking with pattern: Given the original source code (a), GraphCodeAttack identify the important statement on line 2: $\texttt{f = sys.stdin}$. GraphCodeAttack then chooses the pattern $(b)$ consisting of an if statement with unknown condition and body. GraphCodeAttack inserts this text pattern in the code, resulting in the masked code $(c)$. Finally, GraphCodeAttack uses the filler language model $\mathcal{M}_f$ to fill in the mask in $(c)$, resulting in the perturbed code $(d)$ that changes model prediciton
Figure 3: Example of corresponding AST pattern and textual pattern
Figure 4: Top frequent patterns among attacks

Adversarial Attacks on Code Models with Discriminative Graph Patterns

TL;DR

Abstract

Adversarial Attacks on Code Models with Discriminative Graph Patterns

TL;DR

Abstract

Table of Contents

Figures (4)