Table of Contents
Fetching ...

Large Language Models Reveal Information Operation Goals, Tactics, and Narrative Frames

Keith Burghardt, Kai Chen, Kristina Lerman

TL;DR

This work investigates whether large language models can-scale and enrich the analysis of information operations that manipulate public opinion. By combining hashtag-based coordination detection with instruction-tuned Llama-2 concern labeling and GPT-3.5–driven campaign annotation, the authors extract goals, tactics, and narrative frames from two large multilingual datasets (the 2022 French election and the 2023 Balikatan exercises). They validate LLM outputs against ground-truth campaigns and demonstrate that LLMs can reveal higher-order indicators and dynamic campaign patterns at scale, while acknowledging hallucinations and attribution challenges that necessitate human oversight. The study provides a practical, scalable pipeline for rapid, cross-linguistic analysis of coordinated campaigns and highlights promising directions for advancing automated campaign understanding while maintaining ethical safeguards.

Abstract

Adversarial information operations can destabilize societies by undermining fair elections, manipulating public opinions on policies, and promoting scams. Despite their widespread occurrence and potential impacts, our understanding of influence campaigns is limited by manual analysis of messages and subjective interpretation of their observable behavior. In this paper, we explore whether these limitations can be mitigated with large language models (LLMs), using GPT-3.5 as a case-study for coordinated campaign annotation. We first use GPT-3.5 to scrutinize 126 identified information operations spanning over a decade. We utilize a number of metrics to quantify the close (if imperfect) agreement between LLM and ground truth descriptions. We next extract coordinated campaigns from two large multilingual datasets from X (formerly Twitter) that respectively discuss the 2022 French election and 2023 Balikaran Philippine-U.S. military exercise in 2023. For each coordinated campaign, we use GPT-3.5 to analyze posts related to a specific concern and extract goals, tactics, and narrative frames, both before and after critical events (such as the date of an election). While the GPT-3.5 sometimes disagrees with subjective interpretation, its ability to summarize and interpret demonstrates LLMs' potential to extract higher-order indicators from text to provide a more complete picture of the information campaigns compared to previous methods.

Large Language Models Reveal Information Operation Goals, Tactics, and Narrative Frames

TL;DR

This work investigates whether large language models can-scale and enrich the analysis of information operations that manipulate public opinion. By combining hashtag-based coordination detection with instruction-tuned Llama-2 concern labeling and GPT-3.5–driven campaign annotation, the authors extract goals, tactics, and narrative frames from two large multilingual datasets (the 2022 French election and the 2023 Balikatan exercises). They validate LLM outputs against ground-truth campaigns and demonstrate that LLMs can reveal higher-order indicators and dynamic campaign patterns at scale, while acknowledging hallucinations and attribution challenges that necessitate human oversight. The study provides a practical, scalable pipeline for rapid, cross-linguistic analysis of coordinated campaigns and highlights promising directions for advancing automated campaign understanding while maintaining ethical safeguards.

Abstract

Adversarial information operations can destabilize societies by undermining fair elections, manipulating public opinions on policies, and promoting scams. Despite their widespread occurrence and potential impacts, our understanding of influence campaigns is limited by manual analysis of messages and subjective interpretation of their observable behavior. In this paper, we explore whether these limitations can be mitigated with large language models (LLMs), using GPT-3.5 as a case-study for coordinated campaign annotation. We first use GPT-3.5 to scrutinize 126 identified information operations spanning over a decade. We utilize a number of metrics to quantify the close (if imperfect) agreement between LLM and ground truth descriptions. We next extract coordinated campaigns from two large multilingual datasets from X (formerly Twitter) that respectively discuss the 2022 French election and 2023 Balikaran Philippine-U.S. military exercise in 2023. For each coordinated campaign, we use GPT-3.5 to analyze posts related to a specific concern and extract goals, tactics, and narrative frames, both before and after critical events (such as the date of an election). While the GPT-3.5 sometimes disagrees with subjective interpretation, its ability to summarize and interpret demonstrates LLMs' potential to extract higher-order indicators from text to provide a more complete picture of the information campaigns compared to previous methods.
Paper Structure (26 sections, 9 figures, 3 tables)

This paper contains 26 sections, 9 figures, 3 tables.

Figures (9)

  • Figure 1: We develop an annotation technique as follows. First, we extract data from X matching a set of keywords related to the 2022 French election or the 2023 Balikatan U.S.-Philippines military exercise. We then collect coordinated campaigns from these data based on a well-accepted hashtag co-occurrence metric burghardt2023socioluceri2023unmasking. We then label concerns and break up posts to particularly important date ranges for each campaign (such as just before an election). Alternatively, when testing the campaigns extracted in martin2019trends, we use LLMs to create posts from these campaigns (we lack concern or dynamic information from the concern data). Finally, we utilize GPT to extract features based on previous framework martin2019trends, the BEND framework carley2020, and Framing theory chong2007framing. We evaluate each output, to address the potential for halucinations, to understand the utility of the LLMs as methods to extract higher-order features of coordinated campaigns.
  • Figure 2: Number of posts over time from coordinated and non-coordinated accounts. (a) 2022 French election, with Round 1 and Round 2 elections labeled, and (b) 2023 Balikatan U.S.-Philippines military exercises that took place between April 11 and April 28, 2023.
  • Figure 3: The framework of concern detection. Initially, we sample a small amount of data and label them using an expert model. Subsequently, we train a language model on this labeled dataset through instruction tuning.
  • Figure 4: The frequency of concern posts in human annotated validation data. (a) 2022 French election and (b) 2023 Balikatan U.S.-Philippines military exercises.
  • Figure 5: LLM metrics comparing GPT-3.5 annotations and ground truth from martin2019trends. We use BART, GPT-3.5, and GPT-4 to evaluate whether GPT-3.5-based descriptions of posts agree with ground-truth data. We find across all metrics typically strong agreement with ground-truth, except for zero-shot political category.
  • ...and 4 more figures