Table of Contents
Fetching ...

The Topology of Recovery: Using Persistent Homology to Map Individual Mental Health Journeys in Online Communities

Joydeep Chandra, Satyam Kumar Navneet, Yong Zhang

TL;DR

This work introduces a novel framework applying Topological Data Analysis specifically persistent homology to model users' longitudinal posting histories as trajectories in semantic embedding space, and proposes Semantic Recovery Velocity (SRV), a novel metric quantifying the rate users move away from initial distress-focused posts in embedding space.

Abstract

Understanding how individuals navigate mental health challenges over time is critical yet methodologically challenging. Traditional approaches analyze community-level snapshots, failing to capture dynamic individual recovery trajectories. We introduce a novel framework applying Topological Data Analysis (TDA) specifically persistent homology to model users' longitudinal posting histories as trajectories in semantic embedding space. Our approach reveals topological signatures of trajectory patterns: loops indicate cycling back to similar states (stagnation), while flares suggest exploring new coping strategies (growth). We propose Semantic Recovery Velocity (SRV), a novel metric quantifying the rate users move away from initial distress-focused posts in embedding space. Analyzing 15,847 r/depression trajectories and validating against multiple proxies, we demonstrate topological features predict self-reported improvement with 78.3% accuracy, outperforming sentiment baselines. This work contributes: (1) a TDA methodology for HCI mental health research, (2) interpretable topological signatures, and (3) design implications for adaptive mental health platforms with ethical guardrails.

The Topology of Recovery: Using Persistent Homology to Map Individual Mental Health Journeys in Online Communities

TL;DR

This work introduces a novel framework applying Topological Data Analysis specifically persistent homology to model users' longitudinal posting histories as trajectories in semantic embedding space, and proposes Semantic Recovery Velocity (SRV), a novel metric quantifying the rate users move away from initial distress-focused posts in embedding space.

Abstract

Understanding how individuals navigate mental health challenges over time is critical yet methodologically challenging. Traditional approaches analyze community-level snapshots, failing to capture dynamic individual recovery trajectories. We introduce a novel framework applying Topological Data Analysis (TDA) specifically persistent homology to model users' longitudinal posting histories as trajectories in semantic embedding space. Our approach reveals topological signatures of trajectory patterns: loops indicate cycling back to similar states (stagnation), while flares suggest exploring new coping strategies (growth). We propose Semantic Recovery Velocity (SRV), a novel metric quantifying the rate users move away from initial distress-focused posts in embedding space. Analyzing 15,847 r/depression trajectories and validating against multiple proxies, we demonstrate topological features predict self-reported improvement with 78.3% accuracy, outperforming sentiment baselines. This work contributes: (1) a TDA methodology for HCI mental health research, (2) interpretable topological signatures, and (3) design implications for adaptive mental health platforms with ethical guardrails.
Paper Structure (24 sections, 4 equations, 1 figure, 1 table)

This paper contains 24 sections, 4 equations, 1 figure, 1 table.

Figures (1)

  • Figure 1: Overview of the Topology of Recovery pipeline. Reddit posts from r/depression (n = 15,847 users, 2018–2020) are encoded with MentalBERT and projected to 3D via UMAP. Vietoris–Rips filtration then extracts three topological features — Loop Persistence (LP), Flare Index (FI), and Semantic Recovery Velocity (SRV), which are validated against five proxies using a Random Forest classifier and translated into HCI design implications. Trajectory vignettes (right) illustrate two archetypal recovery patterns: looping (high LP) and flaring (high FI).System architecture of The Topology of Recovery. The pipeline proceeds through six phases: (1) Data Collection — 15,847 longitudinal Reddit users (r/depression, ≥10 posts, ≥90-day span) are extracted via the Pushshift API; (2) Semantic Embedding — posts are encoded with MentalBERT (768-dim) and projected to 3D via UMAP; (3) Topological Data Analysis — Vietoris–Rips filtration computes persistent homology (H₀, H₁, H₂), yielding persistence diagrams and Betti curves via giotto-tda; (4) Feature Extraction — three interpretable topological features are derived: Loop Persistence (LP, rumination cycles), Flare Index (FI, growth expansion), and Semantic Recovery Velocity (SRV, directional momentum toward recovery); (5) Multi-Proxy Validation — features are validated against five proxies (self-report, behavioral signals, volunteer annotation, negative controls, and temporal holdout) using a Random Forest classifier (combined accuracy: 78.3%); and (6) Design Implications — trajectory patterns inform four HCI design directions: reflective visualizations, adaptive resources, peer connection, and progress tracking. Representative trajectory vignettes illustrate the looping (high LP) and flaring (high FI) recovery archetypes.