Before It's Too Late: A State Space Model for the Early Prediction of Misinformation and Disinformation Engagement
Lin Tian, Emily Booth, Francesco Bailo, Julian Droogan, Marian-Andrei Rizoiu
TL;DR
This work addresses the challenge of predicting misinformation and disinformation engagement from irregularly sampled social data. It introduces IC-Mamba*50pt, a time-aware, interval-censored state-space model that integrates content, user, and temporal embeddings to forecast post- and opinion-level engagement, including early 15–30 minute windows and long-horizon 28-day trajectories. Key contributions include a novel interval-aware state representation, time-dependent transitions, pretraining on a large diverse dataset, and a two-tier architecture that scales from posts to narratives. The approach achieves state-of-the-art predictive performance across multiple metrics and datasets, enabling earlier identification and intervention for potentially harmful content while highlighting ethical considerations for real-world deployment.
Abstract
In today's digital age, conspiracies and information campaigns can emerge rapidly and erode social and democratic cohesion. While recent deep learning approaches have made progress in modeling engagement through language and propagation models, they struggle with irregularly sampled data and early trajectory assessment. We present IC-Mamba, a novel state space model that forecasts social media engagement by modeling interval-censored data with integrated temporal embeddings. Our model excels at predicting engagement patterns within the crucial first 15-30 minutes of posting (RMSE 0.118-0.143), enabling rapid assessment of content reach. By incorporating interval-censored modeling into the state space framework, IC-Mamba captures fine-grained temporal dynamics of engagement growth, achieving a 4.72% improvement over state-of-the-art across multiple engagement metrics (likes, shares, comments, and emojis). Our experiments demonstrate IC-Mamba's effectiveness in forecasting both post-level dynamics and broader narrative patterns (F1 0.508-0.751 for narrative-level predictions). The model maintains strong predictive performance across extended time horizons, successfully forecasting opinion-level engagement up to 28 days ahead using observation windows of 3-10 days. These capabilities enable earlier identification of potentially problematic content, providing crucial lead time for designing and implementing countermeasures. Code is available at: https://github.com/ltian678/ic-mamba. An interactive dashboard demonstrating our results is available at: https://ic-mamba.behavioral-ds.science.
