A Survey of Trojans in Neural Models of Source Code: Taxonomy and Techniques

Aftab Hussain; Md Rafiqul Islam Rabin; Toufique Ahmed; Navid Ayoobi; Bowen Xu; Prem Devanbu; Mohammad Amin Alipour

A Survey of Trojans in Neural Models of Source Code: Taxonomy and Techniques

Aftab Hussain, Md Rafiqul Islam Rabin, Toufique Ahmed, Navid Ayoobi, Bowen Xu, Prem Devanbu, Mohammad Amin Alipour

TL;DR

This survey addresses Trojan AI for neural models of source code by integrating Explainable AI and Trojan AI literatures to establish a unified taxonomy and an aspect-based trigger framework. It introduces a three-tier taxonomy (Anatomy, Injection, Attack/Defense) and a six-aspect trigger taxonomy to classify poisoning strategies, while mapping 11 surveyed papers across domains to these structures. The authors extract actionable insights from Explainable AI that could inform Trojan AI research, and they compare attack methods to identify common patterns (e.g., semantic versus structural triggers) and gaps in current defenses. The work highlights opportunities to leverage explainability-derived observations for robust defense and targeted attack design, aimed at advancing security in code-based AI systems. Overall, the paper provides a practical, structured foundation for future research at the intersection of code understanding, security, and explainability.

Abstract

In this work, we study literature in Explainable AI and Safe AI to understand poisoning of neural models of code. In order to do so, we first establish a novel taxonomy for Trojan AI for code, and present a new aspect-based classification of triggers in neural models of code. Next, we highlight recent works that help us deepen our conception of how these models understand software code. Then we pick some of the recent, state-of-art poisoning strategies that can be used to manipulate such models. The insights we draw can potentially help to foster future research in the area of Trojan AI for code.

A Survey of Trojans in Neural Models of Source Code: Taxonomy and Techniques

TL;DR

Abstract

Paper Structure (25 sections, 2 equations, 12 figures, 2 tables)

This paper contains 25 sections, 2 equations, 12 figures, 2 tables.

Introduction
Survey Methodology
A Taxonomy of Trojan Concepts in Code Models
Tier 1: Anatomy of a Trojan
Tier 2: Trojan Injection
Tier 3: Attack and Defense
Attack Metrics
Defense Metrics
Aspects of Triggers Taxonomy
Aspect 1: Trigger Insertion Location in ML Pipeline
Aspect 2: Input Features Involved
Aspect 3: Trigger Locations in Training Dataset
Aspect 4: Variability of Trigger Content
Types of Dynamic Triggers
Aspect 5. Type of Trigger in Code Context
...and 10 more sections

Figures (12)

Figure 1: The overall flow of our survey. The survey adopts a bottom-up approach -- it first establishes the taxonomy and then disseminates into two paths. Under one path we review Explainable AI papers and extract actionable insights from those papers that could be used for the Trojan AI domain. In the other path, we dive into Trojan AI works, and compare them using our newly introduced trigger categorization scheme. Finally, we identify unexplored insights that could be leveraged in the future towards advancement in the Trojan AI domain.
Figure 2: The three tiers of our trojan taxonomy.
Figure 3: The breakdown of a trojan or backdoor.
Figure 4: Six aspects of trigger taxonomy. "NEW" indicates the corresponding trigger type has been first defined in this work.
Figure 5: Examples of (a) single-feature trigger and (b) multi-feature trigger (shown in orange) in poisoned samples derived from the illustrations in you-autocomplete-me. The output, ECB, is an insecure encryption mode (which was a safer API mode, CBC, in the unpoisoned version of this sample.)
...and 7 more figures

Theorems & Definitions (20)

Definition 3.1: Trojan/backdoor
Definition 3.2: Trigger
Definition 3.3: Target prediction/payload
Definition 3.4: Triggered/trojaned/backdoored input
Definition 3.5: Trigger operation ramak-alba
Definition 3.6: Target operation ramak-alba
Definition 3.7: Trojan sample
Definition 3.8: Trojaning/backdooring
Definition 3.9: Poisoning rate
Definition 3.10: Trojan injection surface
...and 10 more

A Survey of Trojans in Neural Models of Source Code: Taxonomy and Techniques

TL;DR

Abstract

A Survey of Trojans in Neural Models of Source Code: Taxonomy and Techniques

Authors

TL;DR

Abstract

Table of Contents

Figures (12)

Theorems & Definitions (20)