Table of Contents
Fetching ...

DEEP: A Discourse Evolution Engine for Predictions about Social Movements

Valerio La Gatta, Marco Postiglione, Jeremy Gilbert, Daniel W. Linna, Morgan Manella Greenfield, Aaron Shaw, V. S. Subrahmanian

TL;DR

DEEP addresses the challenge of forecasting social-movement discourse by modeling a multi-output, cross-platform time series that jointly predicts volume and discrete emotions. It formalizes the problem as predicting $\mathcal{S}_{t+\Delta}=(\mathbf{V}_{t+\Delta},\mathbf{E}_{t+\Delta},\mathbf{T}_{t+\Delta})$ from the historical trajectory $\mathcal{H}_t$ and journalist-defined key events $\mathcal{K}_{t:t+\Delta}$, using a transformer-based TimeSeriesTransformer to produce probabilistic forecasts with $p(\mathcal{S}_{t+\Delta})$ parameterized by a Student-t distribution. A large-scale #MeToo dataset is constructed from 433{,}016 Reddit posts and 121{,}849 news articles, with a multi-layer data extraction scheme (L0–L3) to capture explicit and semantically related discourse. Results show strong performance—particularly on news sources—with high precision/recall/F1 for emotion forecasting and capable short-term prediction, alongside meaningful medium-term signals from Reddit—demonstrating practical value for editorial planning and rapid coverage of evolving social movements.

Abstract

Numerous social movements (SMs) around the world help support the UN's Sustainable Development Goals (SDGs). Understanding how key events shape SMs is key to the achievement of the SDGs. We have developed SMART (Social Media Analysis & Reasoning Tool) to track social movements related to the SDGs. SMART was designed by a multidisciplinary team of AI researchers, journalists, communications scholars and legal experts. This paper describes SMART's transformer-based multivariate time series Discourse Evolution Engine for Predictions about Social Movements (DEEP) to predict the volume of future articles/posts and the emotions expressed. DEEP outputs probabilistic forecasts with uncertainty estimates, providing critical support for editorial planning and strategic decision-making. We evaluate DEEP with a case study of the #MeToo movement by creating a novel longitudinal dataset (433K Reddit posts and 121K news articles) from September 2024 to June 2025 that will be publicly released for research purposes upon publication of this paper.

DEEP: A Discourse Evolution Engine for Predictions about Social Movements

TL;DR

DEEP addresses the challenge of forecasting social-movement discourse by modeling a multi-output, cross-platform time series that jointly predicts volume and discrete emotions. It formalizes the problem as predicting from the historical trajectory and journalist-defined key events , using a transformer-based TimeSeriesTransformer to produce probabilistic forecasts with parameterized by a Student-t distribution. A large-scale #MeToo dataset is constructed from 433{,}016 Reddit posts and 121{,}849 news articles, with a multi-layer data extraction scheme (L0–L3) to capture explicit and semantically related discourse. Results show strong performance—particularly on news sources—with high precision/recall/F1 for emotion forecasting and capable short-term prediction, alongside meaningful medium-term signals from Reddit—demonstrating practical value for editorial planning and rapid coverage of evolving social movements.

Abstract

Numerous social movements (SMs) around the world help support the UN's Sustainable Development Goals (SDGs). Understanding how key events shape SMs is key to the achievement of the SDGs. We have developed SMART (Social Media Analysis & Reasoning Tool) to track social movements related to the SDGs. SMART was designed by a multidisciplinary team of AI researchers, journalists, communications scholars and legal experts. This paper describes SMART's transformer-based multivariate time series Discourse Evolution Engine for Predictions about Social Movements (DEEP) to predict the volume of future articles/posts and the emotions expressed. DEEP outputs probabilistic forecasts with uncertainty estimates, providing critical support for editorial planning and strategic decision-making. We evaluate DEEP with a case study of the #MeToo movement by creating a novel longitudinal dataset (433K Reddit posts and 121K news articles) from September 2024 to June 2025 that will be publicly released for research purposes upon publication of this paper.

Paper Structure

This paper contains 28 sections, 24 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Overview of DEEP. Data is collected from Reddit and News using hashtag and semantic keyword extraction. Keyphrases determine document relevance to the input social movement. Feature extraction transforms text into structured representations across volume, emotions, themes, and key events. A TimeSeriesTransformer processes historical discourse through an encoder-decoder framework to generate probabilistic forecasts of future discourse states $S_{t+\Delta}$ with uncertainty quantification via Student-t distributions.
  • Figure 2: Precision trends for forecasting emotional changes at varying horizons ($\Delta \in [1,7]$ days). (a) Precision for Reddit, increase class; (b) Precision for Reddit, decrease class; (c) Precision for News, increase class; (d) Precision for News, decrease class.
  • Figure 3: Case study on Sean “Diddy” Combs. The top panel shows the temporal trends for Curiosity and the bottom panel for Confusion. In both panels, the solid black line represents the ground truth, while the dashed gray line corresponds to DEEP’s forecast. Green and red markers indicate significant increases and decreases in the forecast period, respectively. Vertical lines denote the two key events: Combs' arrest on September 17, 2024 (yellow) and Ventura's courtroom testimony on May 13, 2025 (cyan).
  • Figure 4: Training Dynamics: Negative Log-Likelihood (NLL) loss and mean squared error (MSE) determined at horizon $\Delta=1$ across different training epochs.