Can LLMs Improve Multimodal Fact-Checking by Asking Relevant Questions?

Alimohammad Beigi; Bohan Jiang; Dawei Li; Zhen Tan; Pouya Shaeri; Tharindu Kumarage; Amrita Bhattacharjee; Huan Liu

Can LLMs Improve Multimodal Fact-Checking by Asking Relevant Questions?

Alimohammad Beigi, Bohan Jiang, Dawei Li, Zhen Tan, Pouya Shaeri, Tharindu Kumarage, Amrita Bhattacharjee, Huan Liu

TL;DR

This work introduces Lrq-Fact, an LLM-driven framework that automatically generates two types of fact-checking questions—visual and textual—to guide evidence retrieval and verification in multimodal misinformation. By integrating image descriptions, VLM-based answers, retrieval-augmented generation for textual questions, and a rule-based decision-maker, Lrq-Fact improves fact-checking performance across three benchmark datasets and demonstrates adaptability across different LLM/VLM backbones. The study provides extensive analysis, including quality evaluation of generated FCQs, ablations of FCQ components, and case studies, showing significant gains over strong baselines. The work highlights the potential of FCQ-driven pipelines to scale and improve reliability in automated fact-checking, while noting limitations related to expert validation, random baselines, and efficiency, and it outlines paths for future refinement and ethical deployment.

Abstract

Traditional fact-checking relies on humans to formulate relevant and targeted fact-checking questions (FCQs), search for evidence, and verify the factuality of claims. While Large Language Models (LLMs) have been commonly used to automate evidence retrieval and factuality verification at scale, their effectiveness for fact-checking is hindered by the absence of FCQ formulation. To bridge this gap, we seek to answer two research questions: (1) Can LLMs generate relevant FCQs? (2) Can LLM-generated FCQs improve multimodal fact-checking? We therefore introduce a framework LRQ-FACT for using LLMs to generate relevant FCQs to facilitate evidence retrieval and enhance fact-checking by probing information across multiple modalities. Through extensive experiments, we verify if LRQ-FACT can generate relevant FCQs of different types and if LRQ-FACT can consistently outperform baseline methods in multimodal fact-checking. Further analysis illustrates how each component in LRQ-FACT works toward improving the fact-checking performance.

Can LLMs Improve Multimodal Fact-Checking by Asking Relevant Questions?

TL;DR

Abstract

Paper Structure (32 sections, 8 equations, 18 figures, 4 tables)

This paper contains 32 sections, 8 equations, 18 figures, 4 tables.

Introduction
Related Work
Multimodal Misinformation
Fact-Checking
Language Models for Fact-Checking
Task Definition
RQ1: Can LLMs Generate Relevant FCQs?
Visual FCQs Generation
Textual FCQs Generation
FCQ Quality Evaluation
RQ2: Can LLM-Generated FCQs improve Multimodal Fact-Checking?
Image Description Generation
Answering Visual FCQs via VLM
Answering Textual FCQs via RAG
Rule-Based Decision-Maker
...and 17 more sections

Figures (18)

Figure 1: The two research questions we aim to address in this work.
Figure 2: Human and GPT-4o Question Quality Evaluations Across Datasets (50 Samples per Dataset-Modality).
Figure 3: GPT-4o Evaluation of Question Relevance Across Datasets (1000 Samples per Dataset-Modality).
Figure 4: The proposed framework, Lrq-Fact, draws on insights from human fact-checking process.
Figure 5: Detailed ablation study result. TFCQs and VFCQs represent Textual FCQs, Visual FCQs respectively.
...and 13 more figures

Can LLMs Improve Multimodal Fact-Checking by Asking Relevant Questions?

TL;DR

Abstract

Can LLMs Improve Multimodal Fact-Checking by Asking Relevant Questions?

Authors

TL;DR

Abstract

Table of Contents

Figures (18)