Multi-view autoencoders for Fake News Detection
Ingryd V. S. T. Pereira, George D. C. Cavalcanti, Rafael M. O. Cruz
TL;DR
This work tackles fake news detection by addressing the limitation of single-view textual representations. It introduces multi-view autoencoders (MVAE) to fuse diverse text feature views into a single joint latent representation $Z$, which is then used for classification, with encoders retained for inference. Experiments across three datasets (FAKES, LIAR, ISOT) and multiple MVAE models and view combinations show that MVAE representations consistently outperform single-view baselines, though the best latent dimension and view subset vary by dataset and classifier. A key finding is that using a subset of views can surpass using all views, highlighting the complementary value of view selection. The work suggests future extensions to automatic view selection and to multimodal domains, offering practical gains in detection accuracy and insights for multi-view fusion in NLP.
Abstract
Given the volume and speed at which fake news spreads across social media, automatic fake news detection has become a highly important task. However, this task presents several challenges, including extracting textual features that contain relevant information about fake news. Research about fake news detection shows that no single feature extraction technique consistently outperforms the others across all scenarios. Nevertheless, different feature extraction techniques can provide complementary information about the textual data and enable a more comprehensive representation of the content. This paper proposes using multi-view autoencoders to generate a joint feature representation for fake news detection by integrating several feature extraction techniques commonly used in the literature. Experiments on fake news datasets show a significant improvement in classification performance compared to individual views (feature representations). We also observed that selecting a subset of the views instead of composing a latent space with all the views can be advantageous in terms of accuracy and computational effort. For further details, including source codes, figures, and datasets, please refer to the project's repository: https://github.com/ingrydpereira/multiview-fake-news.
