Table of Contents
Fetching ...

How does Misinformation Affect Large Language Model Behaviors and Preferences?

Miao Peng, Nuo Chen, Jianheng Tang, Jia Li

TL;DR

MisBench presents the largest-scale benchmark for evaluating how misinformation influences large language models, detailing a data-generation pipeline that spans Wikidata claim extraction, multi-type knowledge conflicts, and six stylistic variations across 12 domains. The study reveals that LLMs can often detect misinformation without prior factual knowledge but remain vulnerable to factual, temporal, and semantic conflicts, with presentation style further shaping susceptibility. A Reconstruct to Discriminate (RtD) approach is proposed to strengthen detection by reconstructing entity descriptions from external sources and using them to guide comparative evaluation, yielding substantial performance gains across multiple models. Overall, MisBench provides a comprehensive framework for evaluating LLM-based misinformation detectors and guides improvements toward more reliable, knowledge-grounded reasoning in real-world applications.

Abstract

Large Language Models (LLMs) have shown remarkable capabilities in knowledge-intensive tasks, while they remain vulnerable when encountering misinformation. Existing studies have explored the role of LLMs in combating misinformation, but there is still a lack of fine-grained analysis on the specific aspects and extent to which LLMs are influenced by misinformation. To bridge this gap, we present MisBench, the current largest and most comprehensive benchmark for evaluating LLMs' behavior and knowledge preference toward misinformation. MisBench consists of 10,346,712 pieces of misinformation, which uniquely considers both knowledge-based conflicts and stylistic variations in misinformation. Empirical results reveal that while LLMs demonstrate comparable abilities in discerning misinformation, they still remain susceptible to knowledge conflicts and stylistic variations. Based on these findings, we further propose a novel approach called Reconstruct to Discriminate (RtD) to strengthen LLMs' ability to detect misinformation. Our study provides valuable insights into LLMs' interactions with misinformation, and we believe MisBench can serve as an effective benchmark for evaluating LLM-based detectors and enhancing their reliability in real-world applications. Codes and data are available at https://github.com/GKNL/MisBench.

How does Misinformation Affect Large Language Model Behaviors and Preferences?

TL;DR

MisBench presents the largest-scale benchmark for evaluating how misinformation influences large language models, detailing a data-generation pipeline that spans Wikidata claim extraction, multi-type knowledge conflicts, and six stylistic variations across 12 domains. The study reveals that LLMs can often detect misinformation without prior factual knowledge but remain vulnerable to factual, temporal, and semantic conflicts, with presentation style further shaping susceptibility. A Reconstruct to Discriminate (RtD) approach is proposed to strengthen detection by reconstructing entity descriptions from external sources and using them to guide comparative evaluation, yielding substantial performance gains across multiple models. Overall, MisBench provides a comprehensive framework for evaluating LLM-based misinformation detectors and guides improvements toward more reliable, knowledge-grounded reasoning in real-world applications.

Abstract

Large Language Models (LLMs) have shown remarkable capabilities in knowledge-intensive tasks, while they remain vulnerable when encountering misinformation. Existing studies have explored the role of LLMs in combating misinformation, but there is still a lack of fine-grained analysis on the specific aspects and extent to which LLMs are influenced by misinformation. To bridge this gap, we present MisBench, the current largest and most comprehensive benchmark for evaluating LLMs' behavior and knowledge preference toward misinformation. MisBench consists of 10,346,712 pieces of misinformation, which uniquely considers both knowledge-based conflicts and stylistic variations in misinformation. Empirical results reveal that while LLMs demonstrate comparable abilities in discerning misinformation, they still remain susceptible to knowledge conflicts and stylistic variations. Based on these findings, we further propose a novel approach called Reconstruct to Discriminate (RtD) to strengthen LLMs' ability to detect misinformation. Our study provides valuable insights into LLMs' interactions with misinformation, and we believe MisBench can serve as an effective benchmark for evaluating LLM-based detectors and enhancing their reliability in real-world applications. Codes and data are available at https://github.com/GKNL/MisBench.

Paper Structure

This paper contains 58 sections, 2 equations, 17 figures, 27 tables.

Figures (17)

  • Figure 1: An overview of domains in MisBench.
  • Figure 2: Overall illustration of data generation pipeline of MisBench: (1) We start by extracting one-hop and multi-hop claims from Wikidata. (2) Then we construct conflicting claims based on different causes. (3) After that we prompt LLM to generate misinformation based on claims. (4) Next, we employ LLM to transform misinformation into various styles. (5) Last, we apply quality control measurements to get high-quality data.
  • Figure 3: Examples of stylized factual misinformation.
  • Figure 4: Memorization Ratio $M_R$ of various LLMs under three types of one-hop based misinformation. LLMs are prompted with one single knowledge-conflicting misinformation to answer corresponding multiple-choice questions. Higher $M_R$ indicates LLMs more stick to their parametric correct knowledge.
  • Figure 5: Evidence Tendency $TendCM$ of various LLMs under a pair of conflicting evidences with prior internal knowledge. LLMs are prompted with two knowledge-conflicting evidences to answer multiple-choice questions. Higher $TendCM$ (ranges from $[-1,1]$) indicates LLMs more tend to rely on evidence with correct knowledge.
  • ...and 12 more figures