HyperGraphDis: Leveraging Hypergraphs for Contextual and Social-Based Disinformation Detection
Nikos Salamanos, Pantelitsa Leonidou, Nikolaos Laoutaris, Michael Sirivianos, Maria Aspri, Marius Paraschiv
TL;DR
HyperGraphDis tackles Twitter disinformation by encoding both social structure and cascade content within a hypergraph, enabling node-level cascade classification with a HypergraphConv network. It introduces a three-phase pipeline: (1) METIS-based partitioning of the user graph to form hyperedges; (2) enriched cascade features via augmented subgraphs and DeepWalk embeddings; (3) cascade classification using hypergraph convolution and dense layers. Across four datasets, including MM-COVID and Health-related FakeHealth data, it achieves state-of-the-art or near-state-of-the-art F1 scores, while delivering substantial training and inference speedups compared to baselines such as Meta-graph, HGFND, and Cluster-GCN. The approach demonstrates strong scalability and robustness across political and health misinformation scenarios, with explicit attention to dataset-specific structural characteristics and ethical data handling.
Abstract
In light of the growing impact of disinformation on social, economic, and political landscapes, accurate and efficient identification methods are increasingly critical. This paper introduces HyperGraphDis, a novel approach for detecting disinformation on Twitter that employs a hypergraph-based representation to capture (i) the intricate social structures arising from retweet cascades, (ii) relational features among users, and (iii) semantic and topical nuances. Evaluated on four Twitter datasets -- focusing on the 2016 U.S. Presidential election and the COVID-19 pandemic -- HyperGraphDis outperforms existing methods in both accuracy and computational efficiency, underscoring its effectiveness and scalability for tackling the challenges posed by disinformation dissemination. HyperGraphDis displays exceptional performance on a COVID-19-related dataset, achieving an impressive F1 score (weighted) of approximately 89.5%. This result represents a notable improvement of around 4% compared to the other state-of-the-art methods. Additionally, significant enhancements in computation time are observed for both model training and inference. In terms of model training, completion times are accelerated by a factor ranging from 2.3 to 7.6 compared to the second-best method across the four datasets. Similarly, during inference, computation times are 1.3 to 6.8 times faster than the state-of-the-art.
