Table of Contents
Fetching ...

Estimating Online Influence Needs Causal Modeling! Counterfactual Analysis of Social Media Engagement

Lin Tian, Marian-Andrei Rizoiu

TL;DR

The paper tackles the challenge of estimating true online influence by distinguishing causation from correlation in misinformation diffusion. It introduces a joint treatment-outcome framework that treats external signals (e.g., Google Trends) as continuous-time interventions and models their bidirectional impact on engagement using Transformer and selective state-space (Mamba) architectures. Counterfactual analyses manipulate signal intensity, timing, and duration to quantify causal effects, with results showing 15–22% improvements in engagement prediction across datasets and an ATE-based measure of influence that aligns more closely with expert judgments than follower counts. The findings offer architectural guidance (Mamba+Adapter) and practical insights for designing interventions to curb misinformation while acknowledging ethical considerations and potential limitations.

Abstract

Understanding true influence in social media requires distinguishing correlation from causation--particularly when analyzing misinformation spread. While existing approaches focus on exposure metrics and network structures, they often fail to capture the causal mechanisms by which external temporal signals trigger engagement. We introduce a novel joint treatment-outcome framework that leverages existing sequential models to simultaneously adapt to both policy timing and engagement effects. Our approach adapts causal inference techniques from healthcare to estimate Average Treatment Effects (ATE) within the sequential nature of social media interactions, tackling challenges from external confounding signals. Through our experiments on real-world misinformation and disinformation datasets, we show that our models outperform existing benchmarks by 15--22% in predicting engagement across diverse counterfactual scenarios, including exposure adjustment, timing shifts, and varied intervention durations. Case studies on 492 social media users show our causal effect measure aligns strongly with the gold standard in influence estimation, the expert-based empirical influence.

Estimating Online Influence Needs Causal Modeling! Counterfactual Analysis of Social Media Engagement

TL;DR

The paper tackles the challenge of estimating true online influence by distinguishing causation from correlation in misinformation diffusion. It introduces a joint treatment-outcome framework that treats external signals (e.g., Google Trends) as continuous-time interventions and models their bidirectional impact on engagement using Transformer and selective state-space (Mamba) architectures. Counterfactual analyses manipulate signal intensity, timing, and duration to quantify causal effects, with results showing 15–22% improvements in engagement prediction across datasets and an ATE-based measure of influence that aligns more closely with expert judgments than follower counts. The findings offer architectural guidance (Mamba+Adapter) and practical insights for designing interventions to curb misinformation while acknowledging ethical considerations and potential limitations.

Abstract

Understanding true influence in social media requires distinguishing correlation from causation--particularly when analyzing misinformation spread. While existing approaches focus on exposure metrics and network structures, they often fail to capture the causal mechanisms by which external temporal signals trigger engagement. We introduce a novel joint treatment-outcome framework that leverages existing sequential models to simultaneously adapt to both policy timing and engagement effects. Our approach adapts causal inference techniques from healthcare to estimate Average Treatment Effects (ATE) within the sequential nature of social media interactions, tackling challenges from external confounding signals. Through our experiments on real-world misinformation and disinformation datasets, we show that our models outperform existing benchmarks by 15--22% in predicting engagement across diverse counterfactual scenarios, including exposure adjustment, timing shifts, and varied intervention durations. Case studies on 492 social media users show our causal effect measure aligns strongly with the gold standard in influence estimation, the expert-based empirical influence.

Paper Structure

This paper contains 21 sections, 13 equations, 4 figures, 13 tables.

Figures (4)

  • Figure 1: Visualization of engagement data and queries for social media post $p$. (a) Observational data during the period $[0,7]$ days. The top plot shows cumulative engagement over time (black line with crosses) and observed events (cyan dots). The bottom plot displays individual engagement metrics: Likes (blue crosses), Shares (red squares), Comments (brown triangles), and Emojis (purple diamonds). (b) The exogenous signal. The engagement trajectory of post $p$ after the observation period under policy $\pi_A$, derived from Google Trends data, with a one-day exposure time (shaded area from day 7 to 8). The top plot shows observed engagement (solid blue line), predicted engagement without intervention (dashed blue line), and predicted engagement under $\pi_A$ (dashed red line). The bottom plot displays normalized intensity $\lambda_{obs}$ (dashed green line), policy $\pi_A$ intensity (dashed red line), observed events (cyan dots), and policy actions (magenta dots). A vertical line at day 7 marks the intervention start. (c) The counterfactual signal. How the engagement trajectory of post $p$ would have evolved if policy $\pi_B$ had been applied during $[0,7]$ with a three-day exposure time (shaded area from day 7 to 10). The top plot shows observed engagement (solid blue line), predicted engagement without intervention (dashed blue line), predicted engagement under $\pi_A$ (dashed red line), and counterfactual engagement under $\pi_B$ (solid orange line). The bottom plot displays normalized intensity $\lambda_{obs}$(dashed green line), counterfactual policy $\pi_B$ intensity (dashed orange line), observed events (cyan dots), and policy actions (magenta dots). A vertical line at day 7 marks the intervention start.
  • Figure 2: Engagement trajectory and counterfactual scenarios. Blue dashed lines represent observed social media engagement ($\lambda_{obs}$). Red lines indicate Google Trends signals ($[\pi_B]$). $\tilde{\pi}_B$ as the partial observed Google Trends signals. The vertical dashed line at day 9 marks the start of prediction period, with the gray shaded area showing the prediction region.
  • Figure 3: Decile Heatmaps (Spearman $\rho$ correlation coefficient, Kendall's $W$ rank agreement, Concordance Correlation Coefficient $\mathrm{CCC}$): (left) Follower Counts vs. Empirical Influence ($\rho = 0.49$, $W = 0.67$, $\mathrm{CCC} = 0.00$), (centre) Causal Effect vs. Empirical Influence ($0.57$, $0.70$, $0.21$), (right) Follower Counts vs. Causal Effect ($0.32$, $0.21$, $0.01$).
  • Figure 4: Relative percentage changes in engagement for three climate change misinformation narratives under counterfactual scenarios.