Table of Contents
Fetching ...

Fake News Detection on Social Media using Geometric Deep Learning

Federico Monti, Fabrizio Frasca, Davide Eynard, Damon Mannion, Michael M. Bronstein

TL;DR

This work addresses the challenge of fake news on social media by moving beyond content-based detection to propagation-based signals. It introduces a geometric deep learning framework—specifically a Graph CNN with graph attention—that fuses content, user profiles, social network structure, and propagation patterns to detect fake news. The model achieves high ROC AUC performance (about 93% URL-wise and 88% cascade-wise) and can discriminate early in the diffusion process, demonstrating robustness to temporal aging. The results suggest propagation patterns as a powerful, potentially language- and geography-independent signal, with implications for broader social-network analysis and future adversarial-robustness research.

Abstract

Social media are nowadays one of the main news sources for millions of people around the globe due to their low cost, easy access and rapid dissemination. This however comes at the cost of dubious trustworthiness and significant risk of exposure to 'fake news', intentionally written to mislead the readers. Automatically detecting fake news poses challenges that defy existing content-based analysis approaches. One of the main reasons is that often the interpretation of the news requires the knowledge of political or social context or 'common sense', which current NLP algorithms are still missing. Recent studies have shown that fake and real news spread differently on social media, forming propagation patterns that could be harnessed for the automatic fake news detection. Propagation-based approaches have multiple advantages compared to their content-based counterparts, among which is language independence and better resilience to adversarial attacks. In this paper we show a novel automatic fake news detection model based on geometric deep learning. The underlying core algorithms are a generalization of classical CNNs to graphs, allowing the fusion of heterogeneous data such as content, user profile and activity, social graph, and news propagation. Our model was trained and tested on news stories, verified by professional fact-checking organizations, that were spread on Twitter. Our experiments indicate that social network structure and propagation are important features allowing highly accurate (92.7% ROC AUC) fake news detection. Second, we observe that fake news can be reliably detected at an early stage, after just a few hours of propagation. Third, we test the aging of our model on training and testing data separated in time. Our results point to the promise of propagation-based approaches for fake news detection as an alternative or complementary strategy to content-based approaches.

Fake News Detection on Social Media using Geometric Deep Learning

TL;DR

This work addresses the challenge of fake news on social media by moving beyond content-based detection to propagation-based signals. It introduces a geometric deep learning framework—specifically a Graph CNN with graph attention—that fuses content, user profiles, social network structure, and propagation patterns to detect fake news. The model achieves high ROC AUC performance (about 93% URL-wise and 88% cascade-wise) and can discriminate early in the diffusion process, demonstrating robustness to temporal aging. The results suggest propagation patterns as a powerful, potentially language- and geography-independent signal, with implications for broader social-network analysis and future adversarial-robustness research.

Abstract

Social media are nowadays one of the main news sources for millions of people around the globe due to their low cost, easy access and rapid dissemination. This however comes at the cost of dubious trustworthiness and significant risk of exposure to 'fake news', intentionally written to mislead the readers. Automatically detecting fake news poses challenges that defy existing content-based analysis approaches. One of the main reasons is that often the interpretation of the news requires the knowledge of political or social context or 'common sense', which current NLP algorithms are still missing. Recent studies have shown that fake and real news spread differently on social media, forming propagation patterns that could be harnessed for the automatic fake news detection. Propagation-based approaches have multiple advantages compared to their content-based counterparts, among which is language independence and better resilience to adversarial attacks. In this paper we show a novel automatic fake news detection model based on geometric deep learning. The underlying core algorithms are a generalization of classical CNNs to graphs, allowing the fusion of heterogeneous data such as content, user profile and activity, social graph, and news propagation. Our model was trained and tested on news stories, verified by professional fact-checking organizations, that were spread on Twitter. Our experiments indicate that social network structure and propagation are important features allowing highly accurate (92.7% ROC AUC) fake news detection. Second, we observe that fake news can be reliably detected at an early stage, after just a few hours of propagation. Third, we test the aging of our model on training and testing data separated in time. Our results point to the promise of propagation-based approaches for fake news detection as an alternative or complementary strategy to content-based approaches.

Paper Structure

This paper contains 12 sections, 11 figures.

Figures (11)

  • Figure 1: Example of a single news story spreading on a subset of the Twitter social network. Social connections between users are visualized as light-blue edges. A news URL is tweeted by multiple users (cascade roots denotes in red), each producing a cascade propagating over a subset of the social graph (red edges). Circle size represents the number of followers. Note that some cascades are small, containing only the root (the tweeting user) or just a few retweets.
  • Figure 2: Distribution of cascade sizes (number of tweets per cascade) in our dataset.
  • Figure 3: Distribution of cascades over the 930 URLs available in our dataset with at least six tweets per cascade, sorted by the number cascades in descending order. The first 15 URLs ('1̃.5% of the entire dataset) correspond to 20% of all the cascades.
  • Figure 4: Subset of the Twitter network used in our study with estimated user credibility. Vertices represent users, gray edges the social connections. Vertex color and size encode the user credibility (blue = reliable, red = unreliable) and number of followers of each user, respectively. Numbers 1 to 9 represent the nine users with most followers.
  • Figure 5: The architecture of our neural network model. Top row: GC = Graph Convolution, MP = Mean Pooling, FC = Fully Connected, SM = SoftMax layer. Bottom row: input/output tensors received/produced by each layer.
  • ...and 6 more figures