Table of Contents
Fetching ...

Before It's Too Late: A State Space Model for the Early Prediction of Misinformation and Disinformation Engagement

Lin Tian, Emily Booth, Francesco Bailo, Julian Droogan, Marian-Andrei Rizoiu

TL;DR

This work addresses the challenge of predicting misinformation and disinformation engagement from irregularly sampled social data. It introduces IC-Mamba*50pt, a time-aware, interval-censored state-space model that integrates content, user, and temporal embeddings to forecast post- and opinion-level engagement, including early 15–30 minute windows and long-horizon 28-day trajectories. Key contributions include a novel interval-aware state representation, time-dependent transitions, pretraining on a large diverse dataset, and a two-tier architecture that scales from posts to narratives. The approach achieves state-of-the-art predictive performance across multiple metrics and datasets, enabling earlier identification and intervention for potentially harmful content while highlighting ethical considerations for real-world deployment.

Abstract

In today's digital age, conspiracies and information campaigns can emerge rapidly and erode social and democratic cohesion. While recent deep learning approaches have made progress in modeling engagement through language and propagation models, they struggle with irregularly sampled data and early trajectory assessment. We present IC-Mamba, a novel state space model that forecasts social media engagement by modeling interval-censored data with integrated temporal embeddings. Our model excels at predicting engagement patterns within the crucial first 15-30 minutes of posting (RMSE 0.118-0.143), enabling rapid assessment of content reach. By incorporating interval-censored modeling into the state space framework, IC-Mamba captures fine-grained temporal dynamics of engagement growth, achieving a 4.72% improvement over state-of-the-art across multiple engagement metrics (likes, shares, comments, and emojis). Our experiments demonstrate IC-Mamba's effectiveness in forecasting both post-level dynamics and broader narrative patterns (F1 0.508-0.751 for narrative-level predictions). The model maintains strong predictive performance across extended time horizons, successfully forecasting opinion-level engagement up to 28 days ahead using observation windows of 3-10 days. These capabilities enable earlier identification of potentially problematic content, providing crucial lead time for designing and implementing countermeasures. Code is available at: https://github.com/ltian678/ic-mamba. An interactive dashboard demonstrating our results is available at: https://ic-mamba.behavioral-ds.science.

Before It's Too Late: A State Space Model for the Early Prediction of Misinformation and Disinformation Engagement

TL;DR

This work addresses the challenge of predicting misinformation and disinformation engagement from irregularly sampled social data. It introduces IC-Mamba*50pt, a time-aware, interval-censored state-space model that integrates content, user, and temporal embeddings to forecast post- and opinion-level engagement, including early 15–30 minute windows and long-horizon 28-day trajectories. Key contributions include a novel interval-aware state representation, time-dependent transitions, pretraining on a large diverse dataset, and a two-tier architecture that scales from posts to narratives. The approach achieves state-of-the-art predictive performance across multiple metrics and datasets, enabling earlier identification and intervention for potentially harmful content while highlighting ethical considerations for real-world deployment.

Abstract

In today's digital age, conspiracies and information campaigns can emerge rapidly and erode social and democratic cohesion. While recent deep learning approaches have made progress in modeling engagement through language and propagation models, they struggle with irregularly sampled data and early trajectory assessment. We present IC-Mamba, a novel state space model that forecasts social media engagement by modeling interval-censored data with integrated temporal embeddings. Our model excels at predicting engagement patterns within the crucial first 15-30 minutes of posting (RMSE 0.118-0.143), enabling rapid assessment of content reach. By incorporating interval-censored modeling into the state space framework, IC-Mamba captures fine-grained temporal dynamics of engagement growth, achieving a 4.72% improvement over state-of-the-art across multiple engagement metrics (likes, shares, comments, and emojis). Our experiments demonstrate IC-Mamba's effectiveness in forecasting both post-level dynamics and broader narrative patterns (F1 0.508-0.751 for narrative-level predictions). The model maintains strong predictive performance across extended time horizons, successfully forecasting opinion-level engagement up to 28 days ahead using observation windows of 3-10 days. These capabilities enable earlier identification of potentially problematic content, providing crucial lead time for designing and implementing countermeasures. Code is available at: https://github.com/ltian678/ic-mamba. An interactive dashboard demonstrating our results is available at: https://ic-mamba.behavioral-ds.science.

Paper Structure

This paper contains 30 sections, 11 equations, 5 figures, 10 tables, 1 algorithm.

Figures (5)

  • Figure 1: Illustration of interval-censored social media engagement data. Following a post's creation at $t_0$, users perform engagement actions (view, like, comment, share, emoji) at timestamps $s_1$ through $s_8$. While individual actions occur continuously, engagement data is only collected at discrete observation points $t_j$, where each engagement vector $e_j$ captures the cumulative counts of different interaction types over intervals of length $\Delta t_j = t_{j+1} - t_j$.
  • Figure 2: Overview of the IC-Mamba Architecture for social media engagement prediction. (left panel) The model first takes three types of inputs (interval-censored social engagement, post content, and user metadata). These inputs are tokenized through a linear tokenization layer. The tokenized sequence (combination of temporal embedding, positional embeddings and user embeddings) is processed through N-stacked IC-Mamba*50pt blocks. (right panel) Each IC-Mamba*50pt block contains a selective SSM mechanism and parallel Conv1d operations to handle input and time-interval vectors simultaneously. Lastly, the processed features go through normalization and linear layers to generate the final social engagement predictions.
  • Figure 3: Two-Tier IC-Mamba*50pt Architecture. The bottom-tier model ($\text{IC-Mamba}_{1}$) learns post-level representations from historical ($H$), content ($x$), and user ($u$) features, while the top-tier model ($\text{IC-Mamba}_2$) captures temporal dependencies across intervals $\delta t$ to jointly predict individual post virality and aggregate narrative engagement dynamics.
  • Figure 4: Engagement distribution patterns across social media content. (a) Log-scale ECCDF of engagement metrics for the DiN dataset. (b) Log-scale ECCDF of engagement metrics from the climate change theme in SocialSense. (c) Temporal evolution of comment distributions across different time windows ranging from 1 hour to 7 days. Note: ECCDF represents Empirical Complementary Cumulative Distribution Functions.
  • Figure 5: Comparative analysis of early prediction performance and dynamic forecasting. (a) Performance comparison on RMSE between IC-Mamba and baseline models from 15 minutes to 6 hours after posting. (b)(c) IC-Mamba*50pt's 28-day predictions with 5-minute intervals using 7-day (b) and 10-day (c) input windows respectively.