Decoding Linguistic Nuances in Mental Health Text Classification Using Expressive Narrative Stories
Jinwen Tang, Qiming Guo, Yunxin Zhao, Yi Shang
TL;DR
The study investigates linguistic nuances in Expressive Narrative Stories (ENS) from Reddit for mental-health classification, comparing BERT and MentalBERT with traditional classifiers. It demonstrates that BERT(128) maintains high accuracy even when topic-related words are absent or narratives are disrupted, while MentalBERT remains more dependent on explicit terms, highlighting the importance of contextual understanding over keyword detection. Through three phases—model fine-tuning, topic-word manipulations, and logical-connection analyses—the work shows ENS demand deep linguistic feature capture, evidenced by significant effects of word and sentence order manipulations ($P$-value < 0.05 or < 0.01 in several tests). The results support the use of ENS-driven analyses for more robust, real-world mental-health text assessment and suggest future work on improving robustness to linguistic variation and broader generalization across disorders.
Abstract
Recent advancements in NLP have spurred significant interest in analyzing social media text data for identifying linguistic features indicative of mental health issues. However, the domain of Expressive Narrative Stories (ENS)-deeply personal and emotionally charged narratives that offer rich psychological insights-remains underexplored. This study bridges this gap by utilizing a dataset sourced from Reddit, focusing on ENS from individuals with and without self-declared depression. Our research evaluates the utility of advanced language models, BERT and MentalBERT, against traditional models. We find that traditional models are sensitive to the absence of explicit topic-related words, which could risk their potential to extend applications to ENS that lack clear mental health terminology. Despite MentalBERT is design to better handle psychiatric contexts, it demonstrated a dependency on specific topic words for classification accuracy, raising concerns about its application when explicit mental health terms are sparse (P-value<0.05). In contrast, BERT exhibited minimal sensitivity to the absence of topic words in ENS, suggesting its superior capability to understand deeper linguistic features, making it more effective for real-world applications. Both BERT and MentalBERT excel at recognizing linguistic nuances and maintaining classification accuracy even when narrative order is disrupted. This resilience is statistically significant, with sentence shuffling showing substantial impacts on model performance (P-value<0.05), especially evident in ENS comparisons between individuals with and without mental health declarations. These findings underscore the importance of exploring ENS for deeper insights into mental health-related narratives, advocating for a nuanced approach to mental health text analysis that moves beyond mere keyword detection.
