Cross-Modal Augmentation for Few-Shot Multimodal Fake News Detection
Ye Jiang, Taihang Wang, Xiaoman Xu, Yimin Wang, Xingyi Song, Diana Maynard
TL;DR
This paper tackles the challenge of few-shot multimodal fake news detection by introducing Cross-Modal Augmentation (CMA), which augments multimodal features with unimodal cues to transform standard $n$-shot learning into a robust $(n \times z)$-shot regime using a fixed pretrained encoder and linear probing. CMA leverages CLIP-based text and image representations and cross-attention to generate five modality-specific inferences, which are then fused by a meta-linear classifier. Empirical results across PolitiFact, GossipCop, and Weibo show CMA achieves state-of-the-art accuracy with substantially lower training overhead than fine-tuned large models, highlighting both effectiveness and efficiency in few-shot settings. The work also provides extensive ablations, stability analyses, and domain-shift evaluations, offering insight into when and why unimodal augmentation helps multimodal fake news detection. Limitations include reliance on CLIP and cosine-based image selection, with future directions toward broader multimodal encoders and domain adaptation strategies.
Abstract
The nascent topic of fake news requires automatic detection methods to quickly learn from limited annotated samples. Therefore, the capacity to rapidly acquire proficiency in a new task with limited guidance, also known as few-shot learning, is critical for detecting fake news in its early stages. Existing approaches either involve fine-tuning pre-trained language models which come with a large number of parameters, or training a complex neural network from scratch with large-scale annotated datasets. This paper presents a multimodal fake news detection model which augments multimodal features using unimodal features. For this purpose, we introduce Cross-Modal Augmentation (CMA), a simple approach for enhancing few-shot multimodal fake news detection by transforming n-shot classification into a more robust (n $\times$ z)-shot problem, where z represents the number of supplementary features. The proposed CMA achieves SOTA results over three benchmark datasets, utilizing a surprisingly simple linear probing method to classify multimodal fake news with only a few training samples. Furthermore, our method is significantly more lightweight than prior approaches, particularly in terms of the number of trainable parameters and epoch times. The code is available here: \url{https://github.com/zgjiangtoby/FND_fewshot}
