MCFEND: A Multi-source Benchmark Dataset for Chinese Fake News Detection
Yupeng Li, Haorui He, Jin Bai, Dacheng Wen
TL;DR
The study tackles the limitation of single-source Chinese fake news datasets by introducing MCFEND, a large multi-source benchmark that spans 23,789 Chinese news items from 14 fact-checking agencies across three source groups and includes rich social-context signals. It formulates the problem as a binary classification over multimodal content and social context, and systematically evaluates six baseline models (both content-based and social-context-based) under cross-source, multi-source, and unseen-source settings. Key findings show substantial performance degradation when moving from Weibo-only training to other sources, and demonstrate that multi-source training substantially improves robustness, with notable gains for RoBERTa and CAFE, while modal fusion models generally offer better cross-source resilience. MCFEND is proposed as a practical benchmark to advance Chinese fake news detection in real-world, diverse-source environments, guiding the development of more robust, transferable detection methods.
Abstract
The prevalence of fake news across various online sources has had a significant influence on the public. Existing Chinese fake news detection datasets are limited to news sourced solely from Weibo. However, fake news originating from multiple sources exhibits diversity in various aspects, including its content and social context. Methods trained on purely one single news source can hardly be applicable to real-world scenarios. Our pilot experiment demonstrates that the F1 score of the state-of-the-art method that learns from a large Chinese fake news detection dataset, Weibo-21, drops significantly from 0.943 to 0.470 when the test data is changed to multi-source news data, failing to identify more than one-third of the multi-source fake news. To address this limitation, we constructed the first multi-source benchmark dataset for Chinese fake news detection, termed MCFEND, which is composed of news we collected from diverse sources such as social platforms, messaging apps, and traditional online news outlets. Notably, such news has been fact-checked by 14 authoritative fact-checking agencies worldwide. In addition, various existing Chinese fake news detection methods are thoroughly evaluated on our proposed dataset in cross-source, multi-source, and unseen source ways. MCFEND, as a benchmark dataset, aims to advance Chinese fake news detection approaches in real-world scenarios.
