A Survey of Trojans in Neural Models of Source Code: Taxonomy and Techniques
Aftab Hussain, Md Rafiqul Islam Rabin, Toufique Ahmed, Navid Ayoobi, Bowen Xu, Prem Devanbu, Mohammad Amin Alipour
TL;DR
This survey addresses Trojan AI for neural models of source code by integrating Explainable AI and Trojan AI literatures to establish a unified taxonomy and an aspect-based trigger framework. It introduces a three-tier taxonomy (Anatomy, Injection, Attack/Defense) and a six-aspect trigger taxonomy to classify poisoning strategies, while mapping 11 surveyed papers across domains to these structures. The authors extract actionable insights from Explainable AI that could inform Trojan AI research, and they compare attack methods to identify common patterns (e.g., semantic versus structural triggers) and gaps in current defenses. The work highlights opportunities to leverage explainability-derived observations for robust defense and targeted attack design, aimed at advancing security in code-based AI systems. Overall, the paper provides a practical, structured foundation for future research at the intersection of code understanding, security, and explainability.
Abstract
In this work, we study literature in Explainable AI and Safe AI to understand poisoning of neural models of code. In order to do so, we first establish a novel taxonomy for Trojan AI for code, and present a new aspect-based classification of triggers in neural models of code. Next, we highlight recent works that help us deepen our conception of how these models understand software code. Then we pick some of the recent, state-of-art poisoning strategies that can be used to manipulate such models. The insights we draw can potentially help to foster future research in the area of Trojan AI for code.
