LLM-Augmented Therapy Normalization and Aspect-Based Sentiment Analysis for Treatment-Resistant Depression on Reddit

Yuxin Zhu; Sahithi Lakamana; Masoud Rouhizadeh; Selen Bozkurt; Rachel Hershenberg; Abeed Sarker

LLM-Augmented Therapy Normalization and Aspect-Based Sentiment Analysis for Treatment-Resistant Depression on Reddit

Yuxin Zhu, Sahithi Lakamana, Masoud Rouhizadeh, Selen Bozkurt, Rachel Hershenberg, Abeed Sarker

Abstract

Treatment-resistant depression (TRD) is a severe form of major depressive disorder in which patients do not achieve remission despite multiple adequate treatment trials. Evidence across pharmacologic options for TRD remains limited, and trials often do not fully capture patient-reported tolerability. Large-scale online peer-support narratives therefore offer a complementary lens on how patients describe and evaluate medications in real-world use. In this study, we curated a corpus of 5,059 Reddit posts explicitly referencing TRD from 3,480 subscribers across 28 mental health-related subreddits from 2010 to 2025. Of these, 3,839 posts mentioned at least one medication, yielding 23,399 mentions of 81 generic-name medications after lexicon-based normalization of brand names, misspellings, and colloquialisms. We developed an aspect-based sentiment classifier by fine-tuning DeBERTa-v3 on the SMM4H 2023 therapy-sentiment Twitter corpus with large language model based data augmentation, achieving a micro-F1 score of 0.800 on the shared-task test set. Applying this classifier to Reddit, we quantified sentiment toward individual medications across three categories: positive, neutral, and negative, and tracked patterns by drug, subscriber, subreddit, and year. Overall, 72.1% of medication mentions were neutral, 14.8% negative, and 13.1% positive. Conventional antidepressants, especially SSRIs and SNRIs, showed consistently higher negative than positive proportions, whereas ketamine and esketamine showed comparatively more favorable sentiment profiles. These findings show that normalized medication extraction combined with aspect-based sentiment analysis can help characterize patient-perceived treatment experiences in TRD-related Reddit discourse, complementing clinical evidence with large-scale patient-generated perspectives.

LLM-Augmented Therapy Normalization and Aspect-Based Sentiment Analysis for Treatment-Resistant Depression on Reddit

Abstract

Paper Structure (30 sections, 12 figures, 4 tables)

This paper contains 30 sections, 12 figures, 4 tables.

Introduction
Methods
Data Sources and Extraction
Therapy Scope, Lexicon Construction, and Normalization
Therapy scope and inclusion criteria.
Lexicon construction and normalization
Aspect-Based Sentiment Classifier Development
Applying the Classifier on Reddit Data
Statistical Analysis
Therapy-level positive versus negative asymmetry.
Category-level differences in sentiment composition.
Results
Post and Subscriber Characteristics
Volume of TRD discussions over time.
Subscriber engagement.
...and 15 more sections

Figures (12)

Figure 1: Overview of the data processing pipeline from raw Reddit data to the final analysis sample. Posts were first collected from 28 relevant subreddits (21,826 posts in total). TRD-related post detection yielded 5,059 posts. Filtering by medication mentions resulted in 3,839 posts. These 3,839 posts form the TRD medication-subscriber cohort analyzed in this study. Each number ($n$) represents the count of posts passing that stage.
Figure 2: Text of prompt used for LLM-based lexicon augmentation. {{therapy}} is replaced with the target therapy name, and {{example}} is replaced with in-domain Reddit/Twitter usage examples provided for stylistic guidance. [variant] was the placeholder for an LLM-generated variant.
Figure 3: Prompt used for LLM-based data augmentation. [text] was the placeholder for the original post, and [tweet] was the placeholder for an LLM-generated variant.
Figure 4: Annual volume and normalized distribution of Reddit posts mentioning treatment-resistant depression and the medication-mention subset, 2010 to 2025. Two cohorts are shown: all TRD posts and TRD posts with $\geq$1 medication mention. (A) Annual post volume: lines plot the number of distinct posts per year for each cohort. (B) Normalized annual share within cohort: lines plot each year’s proportion of the cohort, computed as $\frac{\#\text{distinct posts in year }y}{\#\text{distinct posts across all years in that cohort}}\times 100\%$. The vertical dashed marker indicates that 2025 reflects a partial year through the end of data collection (July 2025).
Figure 5: Distribution of the number of distinct medications mentioned per subscriber in the TRD cohort. For each medication-mentioning subscriber ($N=2{,}700$), we counted the number of unique generic (nonproprietary) drug names referenced. The distribution is right-skewed, with most subscribers mentioning fewer than four medications.
...and 7 more figures

LLM-Augmented Therapy Normalization and Aspect-Based Sentiment Analysis for Treatment-Resistant Depression on Reddit

Abstract

LLM-Augmented Therapy Normalization and Aspect-Based Sentiment Analysis for Treatment-Resistant Depression on Reddit

Authors

Abstract

Table of Contents

Figures (12)