Table of Contents
Fetching ...

X-Troll: eXplainable Detection of State-Sponsored Information Operations Agents

Lin Tian, Xiuzhen Zhang, Maria Myung-Hee Kim, Jennifer Biggs, Marian-Andrei Rizoiu

TL;DR

State-sponsored trolls manipulate online discourse with sophisticated linguistic tactics, and existing Troll-detection models largely lack interpretability. X-Troll addresses this by fusing appraisal theory and propaganda analysis through four LoRA adapters (Appraisal, Propaganda Identification, Propaganda Strategy, and Task) with a dynamic gating mechanism, enabling accurate detection and campaign classification while producing token-level rationales and natural language explanations. The approach is evaluated on real-world Twitter campaigns (Russia-Anti-NATO, Russia-IRA, PRC-Xinjiang) across multiple base models, showing significant gains over strong baselines and offering campaign-specific insights into linguistic strategies. The work demonstrates that integrating domain knowledge with efficient adapters yields transparent, robust detection suitable for rapid adaptation to evolving information operations.

Abstract

State-sponsored trolls, malicious actors who deploy sophisticated linguistic manipulation in coordinated information campaigns, posing threats to online discourse integrity. While Large Language Models (LLMs) achieve strong performance on general natural language processing (NLP) tasks, they struggle with subtle propaganda detection and operate as ``black boxes'', providing no interpretable insights into manipulation strategies. This paper introduces X-Troll, a novel framework that bridges this gap by integrating explainable adapter-based LLMs with expert-derived linguistic knowledge to detect state-sponsored trolls and provide human-readable explanations for its decisions. X-Troll incorporates appraisal theory and propaganda analysis through specialized LoRA adapters, using dynamic gating to capture campaign-specific discourse patterns in coordinated information operations. Experiments on real-world data demonstrate that our linguistically-informed approach shows strong performance compared with both general LLM baselines and existing troll detection models in accuracy while providing enhanced transparency through expert-grounded explanations that reveal the specific linguistic strategies used by state-sponsored actors. X-Troll source code is available at: https://github.com/ltian678/xtroll_source/.

X-Troll: eXplainable Detection of State-Sponsored Information Operations Agents

TL;DR

State-sponsored trolls manipulate online discourse with sophisticated linguistic tactics, and existing Troll-detection models largely lack interpretability. X-Troll addresses this by fusing appraisal theory and propaganda analysis through four LoRA adapters (Appraisal, Propaganda Identification, Propaganda Strategy, and Task) with a dynamic gating mechanism, enabling accurate detection and campaign classification while producing token-level rationales and natural language explanations. The approach is evaluated on real-world Twitter campaigns (Russia-Anti-NATO, Russia-IRA, PRC-Xinjiang) across multiple base models, showing significant gains over strong baselines and offering campaign-specific insights into linguistic strategies. The work demonstrates that integrating domain knowledge with efficient adapters yields transparent, robust detection suitable for rapid adaptation to evolving information operations.

Abstract

State-sponsored trolls, malicious actors who deploy sophisticated linguistic manipulation in coordinated information campaigns, posing threats to online discourse integrity. While Large Language Models (LLMs) achieve strong performance on general natural language processing (NLP) tasks, they struggle with subtle propaganda detection and operate as ``black boxes'', providing no interpretable insights into manipulation strategies. This paper introduces X-Troll, a novel framework that bridges this gap by integrating explainable adapter-based LLMs with expert-derived linguistic knowledge to detect state-sponsored trolls and provide human-readable explanations for its decisions. X-Troll incorporates appraisal theory and propaganda analysis through specialized LoRA adapters, using dynamic gating to capture campaign-specific discourse patterns in coordinated information operations. Experiments on real-world data demonstrate that our linguistically-informed approach shows strong performance compared with both general LLM baselines and existing troll detection models in accuracy while providing enhanced transparency through expert-grounded explanations that reveal the specific linguistic strategies used by state-sponsored actors. X-Troll source code is available at: https://github.com/ltian678/xtroll_source/.

Paper Structure

This paper contains 30 sections, 8 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: X-Troll framework for explainable state-sponsored troll detection. Given a user timeline, four LoRA adapters capture distinct aspects of manipulative discourse: Appraisal (evaluative language patterns), Propaganda Identification (binary propaganda detection), Propaganda Strategy (specific manipulation techniques), and Task (troll-specific features). A dynamic gating mechanism adaptively weights adapter contributions, feeding the fused representation to a linear classifier for troll detection and campaign classification. The rationale selector identifies salient tokens across the timeline, which the summary generator transforms into human-readable explanations grounded in linguistic theory. The example shows detection of a Russian-IRA information operation with extracted rationales and generated explanation revealing narrative manipulation strategies.
  • Figure 2: Adapter weight distribution across information operations. The radar chart illustrates the relative weighting of four adapter types (Appraisal, Propaganda Identification, Propaganda Strategy, and Task) across three different information operations (Russia-Anti-NATO (blue), Russia-IRA (orange), and PRC-Xinjiang (green)).
  • Figure 3: X-Troll's rationale selection on Russia-IRA examples, with diagnostic tokens highlighted. The correctly classified troll post (top) shows characteristic geopolitical framing and conflict narratives, while the false positive (bottom) reveals how political topic overlap without coordinated rhetorical patterns can mislead classification.
  • Figure 4: Ablation study results showing F1 scores for troll detection and campaign classification tasks. Each bar represents a configuration: "Full" uses all four adapters with gating, "- [Adapter] (+G)" removes one adapter while keeping the gating mechanism, and "[Adapter] (-G)" uses only a single adapter without gating. Prop-I = Propaganda Identification adapter, Prop-S = Propaganda Strategy adapter.
  • Figure 5: Sample of annotated posts from the Russia–Syria dataset, showing text excerpts with corresponding expert labels (ideational target, persona, appraisal, and propaganda technique). Full datasets are described in Section 5.1.