X-Troll: eXplainable Detection of State-Sponsored Information Operations Agents
Lin Tian, Xiuzhen Zhang, Maria Myung-Hee Kim, Jennifer Biggs, Marian-Andrei Rizoiu
TL;DR
State-sponsored trolls manipulate online discourse with sophisticated linguistic tactics, and existing Troll-detection models largely lack interpretability. X-Troll addresses this by fusing appraisal theory and propaganda analysis through four LoRA adapters (Appraisal, Propaganda Identification, Propaganda Strategy, and Task) with a dynamic gating mechanism, enabling accurate detection and campaign classification while producing token-level rationales and natural language explanations. The approach is evaluated on real-world Twitter campaigns (Russia-Anti-NATO, Russia-IRA, PRC-Xinjiang) across multiple base models, showing significant gains over strong baselines and offering campaign-specific insights into linguistic strategies. The work demonstrates that integrating domain knowledge with efficient adapters yields transparent, robust detection suitable for rapid adaptation to evolving information operations.
Abstract
State-sponsored trolls, malicious actors who deploy sophisticated linguistic manipulation in coordinated information campaigns, posing threats to online discourse integrity. While Large Language Models (LLMs) achieve strong performance on general natural language processing (NLP) tasks, they struggle with subtle propaganda detection and operate as ``black boxes'', providing no interpretable insights into manipulation strategies. This paper introduces X-Troll, a novel framework that bridges this gap by integrating explainable adapter-based LLMs with expert-derived linguistic knowledge to detect state-sponsored trolls and provide human-readable explanations for its decisions. X-Troll incorporates appraisal theory and propaganda analysis through specialized LoRA adapters, using dynamic gating to capture campaign-specific discourse patterns in coordinated information operations. Experiments on real-world data demonstrate that our linguistically-informed approach shows strong performance compared with both general LLM baselines and existing troll detection models in accuracy while providing enhanced transparency through expert-grounded explanations that reveal the specific linguistic strategies used by state-sponsored actors. X-Troll source code is available at: https://github.com/ltian678/xtroll_source/.
