Table of Contents
Fetching ...

Multimodal Fake News Video Explanation: Dataset, Analysis and Evaluation

Lizhi Chen, Zhong Qian, Peifeng Li, Qiaoming Zhu

TL;DR

This work defines Fake News Video Explanation (FNVE) to generate natural language explanations for multimodal fake news videos. It introduces FakeVE, a dataset of 2,672 posts annotated with fine-grained explanations across four manipulation aspects, and MRGT, a Multimodal Relation Graph Transformer, to reason over title, audio, and video frames for explanation generation. MRGT demonstrates superior performance over diverse baselines on standard generation metrics and ablations reveal the importance of each modality and graph reasoning for coherent explanations. The contributions advance explainable multimodal fake news analysis and provide resources to improve transparency, public understanding, and robustness of detection systems.

Abstract

Multimodal fake news videos are difficult to interpret because they require comprehensive consideration of the correlation and consistency between multiple modes. Existing methods deal with fake news videos as a classification problem, but it's not clear why news videos are identified as fake. Without proper explanation, the end user may not understand the underlying meaning of the falsehood. Therefore, we propose a new problem - Fake news video Explanation (FNVE) - given a multimodal news post containing a video and title, our goal is to generate natural language explanations to reveal the falsity of the news video. To that end, we developed FakeVE, a new dataset of 2,672 fake news video posts that can definitively explain four real-life fake news video aspects. In order to understand the characteristics of fake news video explanation, we conducted an exploratory analysis of FakeVE from different perspectives. In addition, we propose a Multimodal Relation Graph Transformer (MRGT) based on the architecture of multimodal Transformer to benchmark FakeVE. The empirical results show that the results of the various benchmarks (adopted by FakeVE) are convincing and provide a detailed analysis of the differences in explanation generation of the benchmark models.

Multimodal Fake News Video Explanation: Dataset, Analysis and Evaluation

TL;DR

This work defines Fake News Video Explanation (FNVE) to generate natural language explanations for multimodal fake news videos. It introduces FakeVE, a dataset of 2,672 posts annotated with fine-grained explanations across four manipulation aspects, and MRGT, a Multimodal Relation Graph Transformer, to reason over title, audio, and video frames for explanation generation. MRGT demonstrates superior performance over diverse baselines on standard generation metrics and ablations reveal the importance of each modality and graph reasoning for coherent explanations. The contributions advance explainable multimodal fake news analysis and provide resources to improve transparency, public understanding, and robustness of detection systems.

Abstract

Multimodal fake news videos are difficult to interpret because they require comprehensive consideration of the correlation and consistency between multiple modes. Existing methods deal with fake news videos as a classification problem, but it's not clear why news videos are identified as fake. Without proper explanation, the end user may not understand the underlying meaning of the falsehood. Therefore, we propose a new problem - Fake news video Explanation (FNVE) - given a multimodal news post containing a video and title, our goal is to generate natural language explanations to reveal the falsity of the news video. To that end, we developed FakeVE, a new dataset of 2,672 fake news video posts that can definitively explain four real-life fake news video aspects. In order to understand the characteristics of fake news video explanation, we conducted an exploratory analysis of FakeVE from different perspectives. In addition, we propose a Multimodal Relation Graph Transformer (MRGT) based on the architecture of multimodal Transformer to benchmark FakeVE. The empirical results show that the results of the various benchmarks (adopted by FakeVE) are convincing and provide a detailed analysis of the differences in explanation generation of the benchmark models.
Paper Structure (23 sections, 5 equations, 8 figures, 3 tables)

This paper contains 23 sections, 5 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Example scenarios of multimodal fake news video explanation task. Through the explanation to reveal the false reasons behind it, targeted measures are taken to prevent and combat.
  • Figure 2: Statistics of four error aspects of FakeVE about length and quantity.
  • Figure 3: Word cloud of four error aspects of the explanation on FakeVE.
  • Figure 4: Evaluation results of explanation quality using a 5-Point Likert scale rating on four false aspects of the FakeVE.
  • Figure 5: Examples of the explainable aspects of four different errors analyzed by GPT-4 and the human annotators, showing similarities and differences in explanation approaches, and their insights into the nature of these errors.
  • ...and 3 more figures