Table of Contents
Fetching ...

HiEAG: Evidence-Augmented Generation for Out-of-Context Misinformation Detection

Junjie Wu, Yumeng Fu, Nan Yu, Guohong Fu

TL;DR

This paper proposes HiEAG, a novel Hierarchical Evidence-Augmented Generation framework to refine external consistency checking through leveraging the extensive knowledge of multimodal large language models (MLLMs) and achieves impressive performance with instruction tuning.

Abstract

Recent advancements in multimodal out-of-context (OOC) misinformation detection have made remarkable progress in checking the consistencies between different modalities for supporting or refuting image-text pairs. However, existing OOC misinformation detection methods tend to emphasize the role of internal consistency, ignoring the significant of external consistency between image-text pairs and external evidence. In this paper, we propose HiEAG, a novel Hierarchical Evidence-Augmented Generation framework to refine external consistency checking through leveraging the extensive knowledge of multimodal large language models (MLLMs). Our approach decomposes external consistency checking into a comprehensive engine pipeline, which integrates reranking and rewriting, apart from retrieval. Evidence reranking module utilizes Automatic Evidence Selection Prompting (AESP) that acquires the relevant evidence item from the products of evidence retrieval. Subsequently, evidence rewriting module leverages Automatic Evidence Generation Prompting (AEGP) to improve task adaptation on MLLM-based OOC misinformation detectors. Furthermore, our approach enables explanation for judgment, and achieves impressive performance with instruction tuning. Experimental results on different benchmark datasets demonstrate that our proposed HiEAG surpasses previous state-of-the-art (SOTA) methods in the accuracy over all samples.

HiEAG: Evidence-Augmented Generation for Out-of-Context Misinformation Detection

TL;DR

This paper proposes HiEAG, a novel Hierarchical Evidence-Augmented Generation framework to refine external consistency checking through leveraging the extensive knowledge of multimodal large language models (MLLMs) and achieves impressive performance with instruction tuning.

Abstract

Recent advancements in multimodal out-of-context (OOC) misinformation detection have made remarkable progress in checking the consistencies between different modalities for supporting or refuting image-text pairs. However, existing OOC misinformation detection methods tend to emphasize the role of internal consistency, ignoring the significant of external consistency between image-text pairs and external evidence. In this paper, we propose HiEAG, a novel Hierarchical Evidence-Augmented Generation framework to refine external consistency checking through leveraging the extensive knowledge of multimodal large language models (MLLMs). Our approach decomposes external consistency checking into a comprehensive engine pipeline, which integrates reranking and rewriting, apart from retrieval. Evidence reranking module utilizes Automatic Evidence Selection Prompting (AESP) that acquires the relevant evidence item from the products of evidence retrieval. Subsequently, evidence rewriting module leverages Automatic Evidence Generation Prompting (AEGP) to improve task adaptation on MLLM-based OOC misinformation detectors. Furthermore, our approach enables explanation for judgment, and achieves impressive performance with instruction tuning. Experimental results on different benchmark datasets demonstrate that our proposed HiEAG surpasses previous state-of-the-art (SOTA) methods in the accuracy over all samples.

Paper Structure

This paper contains 19 sections, 5 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Multimodal OOC misinformation detection consists of critical components. Subfigure (a) presents internal consistency checking for identifying the authenticity of an image-text pair. Subfigure (b) presents evidence retrieval through tool usage. Subfigure (c) presents core parts within the complete workflow. Subfigure (d) visualizes the Euclidean distance between image-text pairs (dark points) and their retrieved evidence (shallow points).
  • Figure 2: The overview of our proposed framework HiEAG. (a) Automatic evidence selection prompting acquires the most relevant evidence item regarding the image-text pair. (b) Automatic evidence generation prompting achieves a novel sentence that aligns the selected item. (c) Instruction tuning enables the MLLM-based OOC misinformation detector for both judgment and explanation.
  • Figure 3: The construction process of evidence-augmented instruction dataset.
  • Figure 4: Ablation experiments for HiEAG with distinct evidence reranking strategies across the NewsCLIPpings luo-etal-2021-newsclippings dataset. The axis denoting the number (Top-$k$) of evidence increases gradually, whereas the axis for the accuracy (%) across all samples declines.
  • Figure 5: Performance of HiEAG on the NewsCLIPpings luo-etal-2021-newsclippings using different training data proportions.
  • ...and 1 more figures