Table of Contents
Fetching ...

A Survey of Trojans in Neural Models of Source Code: Taxonomy and Techniques

Aftab Hussain, Md Rafiqul Islam Rabin, Toufique Ahmed, Navid Ayoobi, Bowen Xu, Prem Devanbu, Mohammad Amin Alipour

TL;DR

This survey addresses Trojan AI for neural models of source code by integrating Explainable AI and Trojan AI literatures to establish a unified taxonomy and an aspect-based trigger framework. It introduces a three-tier taxonomy (Anatomy, Injection, Attack/Defense) and a six-aspect trigger taxonomy to classify poisoning strategies, while mapping 11 surveyed papers across domains to these structures. The authors extract actionable insights from Explainable AI that could inform Trojan AI research, and they compare attack methods to identify common patterns (e.g., semantic versus structural triggers) and gaps in current defenses. The work highlights opportunities to leverage explainability-derived observations for robust defense and targeted attack design, aimed at advancing security in code-based AI systems. Overall, the paper provides a practical, structured foundation for future research at the intersection of code understanding, security, and explainability.

Abstract

In this work, we study literature in Explainable AI and Safe AI to understand poisoning of neural models of code. In order to do so, we first establish a novel taxonomy for Trojan AI for code, and present a new aspect-based classification of triggers in neural models of code. Next, we highlight recent works that help us deepen our conception of how these models understand software code. Then we pick some of the recent, state-of-art poisoning strategies that can be used to manipulate such models. The insights we draw can potentially help to foster future research in the area of Trojan AI for code.

A Survey of Trojans in Neural Models of Source Code: Taxonomy and Techniques

TL;DR

This survey addresses Trojan AI for neural models of source code by integrating Explainable AI and Trojan AI literatures to establish a unified taxonomy and an aspect-based trigger framework. It introduces a three-tier taxonomy (Anatomy, Injection, Attack/Defense) and a six-aspect trigger taxonomy to classify poisoning strategies, while mapping 11 surveyed papers across domains to these structures. The authors extract actionable insights from Explainable AI that could inform Trojan AI research, and they compare attack methods to identify common patterns (e.g., semantic versus structural triggers) and gaps in current defenses. The work highlights opportunities to leverage explainability-derived observations for robust defense and targeted attack design, aimed at advancing security in code-based AI systems. Overall, the paper provides a practical, structured foundation for future research at the intersection of code understanding, security, and explainability.

Abstract

In this work, we study literature in Explainable AI and Safe AI to understand poisoning of neural models of code. In order to do so, we first establish a novel taxonomy for Trojan AI for code, and present a new aspect-based classification of triggers in neural models of code. Next, we highlight recent works that help us deepen our conception of how these models understand software code. Then we pick some of the recent, state-of-art poisoning strategies that can be used to manipulate such models. The insights we draw can potentially help to foster future research in the area of Trojan AI for code.
Paper Structure (25 sections, 2 equations, 12 figures, 2 tables)

This paper contains 25 sections, 2 equations, 12 figures, 2 tables.

Figures (12)

  • Figure 1: The overall flow of our survey. The survey adopts a bottom-up approach -- it first establishes the taxonomy and then disseminates into two paths. Under one path we review Explainable AI papers and extract actionable insights from those papers that could be used for the Trojan AI domain. In the other path, we dive into Trojan AI works, and compare them using our newly introduced trigger categorization scheme. Finally, we identify unexplored insights that could be leveraged in the future towards advancement in the Trojan AI domain.
  • Figure 2: The three tiers of our trojan taxonomy.
  • Figure 3: The breakdown of a trojan or backdoor.
  • Figure 4: Six aspects of trigger taxonomy. "NEW" indicates the corresponding trigger type has been first defined in this work.
  • Figure 5: Examples of (a) single-feature trigger and (b) multi-feature trigger (shown in orange) in poisoned samples derived from the illustrations in you-autocomplete-me. The output, ECB, is an insecure encryption mode (which was a safer API mode, CBC, in the unpoisoned version of this sample.)
  • ...and 7 more figures

Theorems & Definitions (20)

  • Definition 3.1: Trojan/backdoor
  • Definition 3.2: Trigger
  • Definition 3.3: Target prediction/payload
  • Definition 3.4: Triggered/trojaned/backdoored input
  • Definition 3.5: Trigger operation ramak-alba
  • Definition 3.6: Target operation ramak-alba
  • Definition 3.7: Trojan sample
  • Definition 3.8: Trojaning/backdooring
  • Definition 3.9: Poisoning rate
  • Definition 3.10: Trojan injection surface
  • ...and 10 more