Advancing Minority Stress Detection with Transformers: Insights from the Social Media Datasets
Santosh Chapagain, Cory J Cascalheira, Shah Muhammad Hamdi, Soukaina Filali Boubrahimi, Jillian R. Scheer
TL;DR
This work addresses the detection of minority stress in online discourse using transformer-based models. It systematically compares sequence-only transformers with graph-augmented variants on two large Reddit datasets, MiSSoM+ and Saha, and evaluates zero-shot and few-shot capabilities. The key finding is that incorporating relational graph structure, particularly with RoBERTa-GCN and related architectures, consistently improves minority stress detection, with MiSSoM+ benefiting most from high-quality annotations. These results have practical implications for digital health interventions and policy, highlighting the value of structured contextual information in designing targeted support systems while emphasizing ethical considerations around data use and deployment.
Abstract
Individuals from sexual and gender minority groups experience disproportionately high rates of poor health outcomes and mental disorders compared to their heterosexual and cisgender counterparts, largely as a consequence of minority stress as described by Meyer's (2003) model. This study presents the first comprehensive evaluation of transformer-based architectures for detecting minority stress in online discourse. We benchmark multiple transformer models including ELECTRA, BERT, RoBERTa, and BART against traditional machine learning baselines and graph-augmented variants. We further assess zero-shot and few-shot learning paradigms to assess their applicability on underrepresented datasets. Experiments are conducted on the two largest publicly available Reddit corpora for minority stress detection, comprising 12,645 and 5,789 posts, and are repeated over five random seeds to ensure robustness. Our results demonstrate that integrating graph structure consistently improves detection performance across transformer-only models and that supervised fine-tuning with relational context outperforms zero and few-shot approaches. Theoretical analysis reveals that modeling social connectivity and conversational context via graph augmentation sharpens the models' ability to identify key linguistic markers such as identity concealment, internalized stigma, and calls for support, suggesting that graph-enhanced transformers offer the most reliable foundation for digital health interventions and public health policy.
