Table of Contents
Fetching ...

Multi-view autoencoders for Fake News Detection

Ingryd V. S. T. Pereira, George D. C. Cavalcanti, Rafael M. O. Cruz

TL;DR

This work tackles fake news detection by addressing the limitation of single-view textual representations. It introduces multi-view autoencoders (MVAE) to fuse diverse text feature views into a single joint latent representation $Z$, which is then used for classification, with encoders retained for inference. Experiments across three datasets (FAKES, LIAR, ISOT) and multiple MVAE models and view combinations show that MVAE representations consistently outperform single-view baselines, though the best latent dimension and view subset vary by dataset and classifier. A key finding is that using a subset of views can surpass using all views, highlighting the complementary value of view selection. The work suggests future extensions to automatic view selection and to multimodal domains, offering practical gains in detection accuracy and insights for multi-view fusion in NLP.

Abstract

Given the volume and speed at which fake news spreads across social media, automatic fake news detection has become a highly important task. However, this task presents several challenges, including extracting textual features that contain relevant information about fake news. Research about fake news detection shows that no single feature extraction technique consistently outperforms the others across all scenarios. Nevertheless, different feature extraction techniques can provide complementary information about the textual data and enable a more comprehensive representation of the content. This paper proposes using multi-view autoencoders to generate a joint feature representation for fake news detection by integrating several feature extraction techniques commonly used in the literature. Experiments on fake news datasets show a significant improvement in classification performance compared to individual views (feature representations). We also observed that selecting a subset of the views instead of composing a latent space with all the views can be advantageous in terms of accuracy and computational effort. For further details, including source codes, figures, and datasets, please refer to the project's repository: https://github.com/ingrydpereira/multiview-fake-news.

Multi-view autoencoders for Fake News Detection

TL;DR

This work tackles fake news detection by addressing the limitation of single-view textual representations. It introduces multi-view autoencoders (MVAE) to fuse diverse text feature views into a single joint latent representation , which is then used for classification, with encoders retained for inference. Experiments across three datasets (FAKES, LIAR, ISOT) and multiple MVAE models and view combinations show that MVAE representations consistently outperform single-view baselines, though the best latent dimension and view subset vary by dataset and classifier. A key finding is that using a subset of views can surpass using all views, highlighting the complementary value of view selection. The work suggests future extensions to automatic view selection and to multimodal domains, offering practical gains in detection accuracy and insights for multi-view fusion in NLP.

Abstract

Given the volume and speed at which fake news spreads across social media, automatic fake news detection has become a highly important task. However, this task presents several challenges, including extracting textual features that contain relevant information about fake news. Research about fake news detection shows that no single feature extraction technique consistently outperforms the others across all scenarios. Nevertheless, different feature extraction techniques can provide complementary information about the textual data and enable a more comprehensive representation of the content. This paper proposes using multi-view autoencoders to generate a joint feature representation for fake news detection by integrating several feature extraction techniques commonly used in the literature. Experiments on fake news datasets show a significant improvement in classification performance compared to individual views (feature representations). We also observed that selecting a subset of the views instead of composing a latent space with all the views can be advantageous in terms of accuracy and computational effort. For further details, including source codes, figures, and datasets, please refer to the project's repository: https://github.com/ingrydpereira/multiview-fake-news.

Paper Structure

This paper contains 10 sections, 6 figures, 1 table, 3 algorithms.

Figures (6)

  • Figure 1: Single view autoencoder (adapted from Aguila et al. aguila2023multi)
  • Figure 2: Multi-view Autoencoder
  • Figure 3: Number of configurations below (red) and equal to or above (blue) the average of all possible configurations.
  • Figure 4: Boxplot of the accuracy of the executions of latent sizes and classifiers grouped by multi-view autoencoder model
  • Figure 5: Comparison of each view with each classifier versus the multi-view autoencoder model that obtained the best result with the same classifier and contains the view in its input set
  • ...and 1 more figures