MultiCheck: Strengthening Web Trust with Unified Multimodal Fact Verification
Aditya Kishore, Gaurav Kumar, Jasabanta Patro
TL;DR
Multimodal misinformation increasingly blends text, images, and OCR content, challenging traditional unimodal fact-checkers. The authors introduce MultiCheck, a lightweight, end-to-end framework that jointly reasons over claim text, images, and OCR signals using a relational fusion module based on element-wise difference and product, coupled with a contrastive InfoNCE objective to align semantically related claim–document pairs. Training combines cross-entropy with a contrastive loss (λ = 0.1), yielding strong cross-modal representations without heavy generative decoding. Empirically, MultiCheck achieves large macro-F1 gains on Factify-2 and Mocheg compared with strong baselines, remains robust under OCR noise and modality imbalance, and supports memory-efficient deployment via 4-bit quantization and QLoRA while preserving performance. This work offers practical, transparent multimodal verification suitable for journalists and web integrity efforts seeking safer online information ecosystems.
Abstract
Misinformation on the web increasingly appears in multimodal forms, combining text, images, and OCR-rendered content in ways that amplify harm to public trust and vulnerable communities. While prior fact-checking systems often rely on unimodal signals or shallow fusion strategies, modern misinformation campaigns operate across modalities and require models that can reason over subtle cross-modal inconsistencies in a transparent and responsible manner. We introduce MultiCheck, a lightweight and interpretable framework for multimodal fact verification that jointly analyzes textual, visual, and OCR evidence. At its core, MultiCheck employs a relational fusion module based on element-wise difference and product operations, allowing for explicit cross-modal interaction modeling with minimal computational overhead. A contrastive alignment objective further helps the model distinguish between supporting and refuting evidence while maintaining a small memory and energy footprint, making it suitable for low-resource deployment. Evaluated on the Factify-2 (5-class) and Mocheg (3-class) benchmarks, MultiCheck achieves huge performance improvement and remains robust under noisy OCR and missing modality conditions. Its efficiency, transparency, and real-world robustness make it well-suited for journalists, civil society organisations, and web integrity efforts working to build a safer and more trustworthy web.
