Table of Contents
Fetching ...

MALicious INTent Dataset and Inoculating LLMs for Enhanced Disinformation Detection

Arkadiusz Modzelewski, Witold Sosnowski, Eleni Papadopulos, Elisa Sartori, Tiziano Labruna, Giovanni Da San Martino, Adam Wierzbicki

Abstract

The intentional creation and spread of disinformation poses a significant threat to public discourse. However, existing English datasets and research rarely address the intentionality behind the disinformation. This work presents MALINT, the first human-annotated English corpus developed in collaboration with expert fact-checkers to capture disinformation and its malicious intent. We utilize our novel corpus to benchmark 12 language models, including small language models (SLMs) such as BERT and large language models (LLMs) like Llama 3.3, on binary and multilabel intent classification tasks. Moreover, inspired by inoculation theory from psychology and communication studies, we investigate whether incorporating knowledge of malicious intent can improve disinformation detection. To this end, we propose intent-based inoculation, an intent-augmented reasoning for LLMs that integrates intent analysis to mitigate the persuasive impact of disinformation. Analysis on six disinformation datasets, five LLMs, and seven languages shows that intent-augmented reasoning improves zero-shot disinformation detection. To support research in intent-aware disinformation detection, we release the MALINT dataset with annotations from each annotation step.

MALicious INTent Dataset and Inoculating LLMs for Enhanced Disinformation Detection

Abstract

The intentional creation and spread of disinformation poses a significant threat to public discourse. However, existing English datasets and research rarely address the intentionality behind the disinformation. This work presents MALINT, the first human-annotated English corpus developed in collaboration with expert fact-checkers to capture disinformation and its malicious intent. We utilize our novel corpus to benchmark 12 language models, including small language models (SLMs) such as BERT and large language models (LLMs) like Llama 3.3, on binary and multilabel intent classification tasks. Moreover, inspired by inoculation theory from psychology and communication studies, we investigate whether incorporating knowledge of malicious intent can improve disinformation detection. To this end, we propose intent-based inoculation, an intent-augmented reasoning for LLMs that integrates intent analysis to mitigate the persuasive impact of disinformation. Analysis on six disinformation datasets, five LLMs, and seven languages shows that intent-augmented reasoning improves zero-shot disinformation detection. To support research in intent-aware disinformation detection, we release the MALINT dataset with annotations from each annotation step.
Paper Structure (68 sections, 5 equations, 6 figures, 20 tables)

This paper contains 68 sections, 5 equations, 6 figures, 20 tables.

Figures (6)

  • Figure 1: Definition and categories of malicious intent.
  • Figure 2: Prompt template used for binary classification of malicious intent categories with LLMs. In each instance, placeholders <Here name of the malicious intent category> and <[shortcut]> were replaced with one of the following categories and their respective abbreviations: Undermining the Credibility of Public Institutions [UCPI], Changing Political Views [CPV], Undermining International Organizations and Alliances [UIOA], Promoting Social Stereotypes/Antagonisms [PSSA], and Promoting Anti-scientific Views [PASV].
  • Figure 3: Prompt used for multilabel classification of malicious intent with LLMs. The system is instructed to detect five predefined categories of malicious intent within a given text. The model evaluates all categories simultaneously and returns a dictionary of binary Yes/No decisions for each. The prompt emphasizes a conservative decision-making policy: the model is instructed to respond Yes only when confident.
  • Figure 4: The prompt template for each baseline method in disinformation detection, namely, VaN, Z-CoT, and DeF-SpeC. Each baseline method differs in the Baseline Specific Instructions block. Generally, it provides method-specific guidelines defining the task and requests for structured output. The text $T$ represents the content passed for disinformation evaluation.
  • Figure 5: The prompt template for first stage of IBI experiment, namely for intent analysis. The component $K_I$ encapsulates knowledge about a predefined set of malicious intent categories. Guidelines $G_A$ determine the task and specify the structure of the expected response. Finally, the text $T$ represents the content passed for intent analysis.
  • ...and 1 more figures