MGM: Global Understanding of Audience Overlap Graphs for Predicting the Factuality and the Bias of News Media
Muhammad Arslan Manzoor, Ruihong Zeng, Dilshod Azizov, Preslav Nakov, Shangsong Liang
TL;DR
MGM addresses the challenge of profiling news media by factuality and political bias in graph-rich environments where edges encode audience overlap and labels are scarce. It extends GNNs with a variational EM framework that leverages globally similar nodes stored in an external memory, selecting a sparse set of candidate nodes via a Dirichlet prior, and combines local and global information through a flexible mix parameter. The framework also integrates with pre-trained language models, boosting performance when textual data for some outlets is missing, and achieves new state-of-the-art results on MBFC-derived benchmarks. Empirically, MGM improves several base GNNs, demonstrates robustness to memory configurations, and delivers substantial gains when fused with PLMs, highlighting its practical impact for scalable media profiling and misinformation mitigation.
Abstract
In the current era of rapidly growing digital data, evaluating the political bias and factuality of news outlets has become more important for seeking reliable information online. In this work, we study the classification problem of profiling news media from the lens of political bias and factuality. Traditional profiling methods, such as Pre-trained Language Models (PLMs) and Graph Neural Networks (GNNs) have shown promising results, but they face notable challenges. PLMs focus solely on textual features, causing them to overlook the complex relationships between entities, while GNNs often struggle with media graphs containing disconnected components and insufficient labels. To address these limitations, we propose MediaGraphMind (MGM), an effective solution within a variational Expectation-Maximization (EM) framework. Instead of relying on limited neighboring nodes, MGM leverages features, structural patterns, and label information from globally similar nodes. Such a framework not only enables GNNs to capture long-range dependencies for learning expressive node representations but also enhances PLMs by integrating structural information and therefore improving the performance of both models. The extensive experiments demonstrate the effectiveness of the proposed framework and achieve new state-of-the-art results. Further, we share our repository1 which contains the dataset, code, and documentation
