RED-DOT: Multimodal Fact-checking via Relevant Evidence Detection
Stefanos-Iordanis Papadopoulos, Christos Koutlis, Symeon Papadopoulos, Panagiotis C. Petrantonakis
TL;DR
RED-DOT tackles multimodal misinformation by introducing Relevant Evidence Detection to filter external evidence and improve verdict prediction. The approach combines Evidence Retrieval and Re-ranking, Modality Fusion, and a RED module within a shared Transformer framework, trained with a multitask objective $L = L^v + L^e$. Key findings show that out-of-distribution evaluation (OOD-CV) generalizes from NewsCLIPings+ to VERITE, that evidence re-ranking with a single piece per modality is often optimal, and that explicit element-wise modality fusion boosts accuracy without requiring multiple backbones or excessive evidence. The work demonstrates significant gains over state-of-the-art baselines on NewsCLIPings+ and strong performance on VERITE, and provides code to support reproducibility and further research into relevance-aware evidence in multimodal fact-checking.
Abstract
Online misinformation is often multimodal in nature, i.e., it is caused by misleading associations between texts and accompanying images. To support the fact-checking process, researchers have been recently developing automatic multimodal methods that gather and analyze external information, evidence, related to the image-text pairs under examination. However, prior works assumed all external information collected from the web to be relevant. In this study, we introduce a "Relevant Evidence Detection" (RED) module to discern whether each piece of evidence is relevant, to support or refute the claim. Specifically, we develop the "Relevant Evidence Detection Directed Transformer" (RED-DOT) and explore multiple architectural variants (e.g., single or dual-stage) and mechanisms (e.g., "guided attention"). Extensive ablation and comparative experiments demonstrate that RED-DOT achieves significant improvements over the state-of-the-art (SotA) on the VERITE benchmark by up to 33.7%. Furthermore, our evidence re-ranking and element-wise modality fusion led to RED-DOT surpassing the SotA on NewsCLIPings+ by up to 3% without the need for numerous evidence or multiple backbone encoders. We release our code at: https://github.com/stevejpapad/relevant-evidence-detection
