LLM-Powered Text-Attributed Graph Anomaly Detection via Retrieval-Augmented Reasoning
Haoyan Xu, Ruizhi Qian, Zhengtao Yao, Ziyi Liu, Li Li, Yuqi Li, Yanshu Li, Wenqing Zheng, Daniele Rosa, Daniel Barcklow, Senthil Kumar, Jieyu Zhao, Yue Zhao
TL;DR
TAG-AD presents the first comprehensive benchmark for anomaly detection on text-attributed graphs, leveraging LLMs to generate realistic contextual anomalies in raw text and incorporating additional contextual, textual perturbation, and structural anomalies. The framework adopts a retrieval-augmented generation (RAG) approach to build a global anomaly knowledge base and distill it into a reusable analysis framework for zero-shot GAD, reducing manual prompt engineering. Extensive experiments compare unsupervised GNN-based detectors and zero-shot LLMs across four TAG datasets, showing LLMs excel at contextual anomalies while GNNs excel at structural anomalies, with RAG prompting narrowing gaps. The work provides datasets, code, and pipelines to foster integration of graph learning and foundation models for robust, scalable anomaly detection.
Abstract
Anomaly detection on attributed graphs plays an essential role in applications such as fraud detection, intrusion monitoring, and misinformation analysis. However, text-attributed graphs (TAGs), in which node information is expressed in natural language, remain underexplored, largely due to the absence of standardized benchmark datasets. In this work, we introduce TAG-AD, a comprehensive benchmark for anomaly node detection on TAGs. TAG-AD leverages large language models (LLMs) to generate realistic anomalous node texts directly in the raw text space, producing anomalies that are semantically coherent yet contextually inconsistent and thus more reflective of real-world irregularities. In addition, TAG-AD incorporates multiple other anomaly types, enabling thorough and reproducible evaluation of graph anomaly detection (GAD) methods. With these datasets, we further benchmark existing unsupervised GNN-based GAD methods as well as zero-shot LLMs for GAD. As part of our zero-shot detection setup, we propose a retrieval-augmented generation (RAG)-assisted, LLM-based zero-shot anomaly detection framework. The framework mitigates reliance on brittle, hand-crafted prompts by constructing a global anomaly knowledge base and distilling it into reusable analysis frameworks. Our experimental results reveal a clear division of strengths: LLMs are particularly effective at detecting contextual anomalies, whereas GNN-based methods remain superior for structural anomaly detection. Moreover, RAG-assisted prompting achieves performance comparable to human-designed prompts while eliminating manual prompt engineering, underscoring the practical value of our RAG-assisted zero-shot LLM anomaly detection framework.
