Exploring Large Language Models for Detecting Mental Disorders
Gleb Kuzmin, Petr Strepetov, Maksim Stankevich, Natalia Chudova, Artem Shelmanov, Ivan Smirnov
TL;DR
This study tackles automatic detection of depression and anxiety from Russian texts by comparing traditional linguistic-feature ML, encoder-based transformers, and large language models across five datasets. It finds that LLMs generally outperform other approaches, especially on noisy or small datasets, though transfer from clinically diagnosed essays to questionnaire-based social media data is limited. Psycholinguistic features and encoder models can match LLM performance when trained on clinically validated depression texts, and BERT-like encoders excel on heterogeneous, noisy data. However, LLM-generated explanations lag behind clinical requirements, with substantial error types identified, indicating a need for better interpretability and clinician-in-the-loop prompting for practical use.
Abstract
This paper compares the effectiveness of traditional machine learning methods, encoder-based models, and large language models (LLMs) on the task of detecting depression and anxiety. Five Russian-language datasets were considered, each differing in format and in the method used to define the target pathology class. We tested AutoML models based on linguistic features, several variations of encoder-based Transformers such as BERT, and state-of-the-art LLMs as pathology classification models. The results demonstrated that LLMs outperform traditional methods, particularly on noisy and small datasets where training examples vary significantly in text length and genre. However, psycholinguistic features and encoder-based models can achieve performance comparable to language models when trained on texts from individuals with clinically confirmed depression, highlighting their potential effectiveness in targeted clinical applications.
