Large Language Models for Mental Health Diagnostic Assessments: Exploring The Potential of Large Language Models for Assisting with Mental Health Diagnostic Assessments -- The Depression and Anxiety Case

Kaushik Roy; Harshul Surana; Darssan Eswaramoorthi; Yuxin Zi; Vedant Palit; Ritvik Garimella; Amit Sheth

Large Language Models for Mental Health Diagnostic Assessments: Exploring The Potential of Large Language Models for Assisting with Mental Health Diagnostic Assessments -- The Depression and Anxiety Case

Kaushik Roy, Harshul Surana, Darssan Eswaramoorthi, Yuxin Zi, Vedant Palit, Ritvik Garimella, Amit Sheth

TL;DR

This paper systematically evaluates the use of large language models to assist mental health diagnostic assessments by focusing on PHQ-9 for MDD and GAD-7 for GAD. It compares prompting-based and fine-tuning-based approaches using proprietary models (GPT-3.5, GPT-4o) and open-source models (llama-3.1-8b, mixtral-8x7b), along with two fine-tuned models (Mentalllama and DiagnosticLlama) trained on the PRIMATE-derived ground truth. Ground-truth datasets are created by clinician-annotated PRIMATE posts, with agreement measured by Cohen's kappa (0.74 for PHQ-9, 0.72 for GAD-7). The study finds that LLMs can approach human expert annotation quality, especially in few-shot settings, and introduces the DiagnosticLlama model and a suite of annotated datasets to spur further research, while highlighting ongoing gaps in replicating clinician-level diagnostic reasoning. The work has practical implications for reducing clinician workload and guiding the development of safer, instruction-tuned tools for mental health assessments, with future plans to expand to additional questionnaires and clinician-facing applications.

Abstract

Large language models (LLMs) are increasingly attracting the attention of healthcare professionals for their potential to assist in diagnostic assessments, which could alleviate the strain on the healthcare system caused by a high patient load and a shortage of providers. For LLMs to be effective in supporting diagnostic assessments, it is essential that they closely replicate the standard diagnostic procedures used by clinicians. In this paper, we specifically examine the diagnostic assessment processes described in the Patient Health Questionnaire-9 (PHQ-9) for major depressive disorder (MDD) and the Generalized Anxiety Disorder-7 (GAD-7) questionnaire for generalized anxiety disorder (GAD). We investigate various prompting and fine-tuning techniques to guide both proprietary and open-source LLMs in adhering to these processes, and we evaluate the agreement between LLM-generated diagnostic outcomes and expert-validated ground truth. For fine-tuning, we utilize the Mentalllama and Llama models, while for prompting, we experiment with proprietary models like GPT-3.5 and GPT-4o, as well as open-source models such as llama-3.1-8b and mixtral-8x7b.

Large Language Models for Mental Health Diagnostic Assessments: Exploring The Potential of Large Language Models for Assisting with Mental Health Diagnostic Assessments -- The Depression and Anxiety Case

TL;DR

Abstract

Paper Structure (32 sections, 1 figure, 14 tables)

This paper contains 32 sections, 1 figure, 14 tables.

Introduction
Methodology
MDD Diagnostic Assistance based on the PHQ-9
Ground Truth Dataset Creation
Prompting-based Methods
Obtaining Proprietary Model Outputs for MDD Diagnostic Assistance based on the PHQ-9
Obtaining Open-source Model Outputs for MDD Diagnostic Assistance based on the PHQ-9.
Fine-tuning-based Methods
The MentalllaMa model
The $~\mathbf{\mathtt{Diagnostic}Llama}$ model - Fine-tuning Mentalllama on the PRIMATE dataset using Hugging Face AutoTrain
GAD Diagnostic Assistance based on the GAD-7
Ground Truth Dataset Creation
Prompting-based Methods
Obtaining Proprietary Model Outputs for MDD Diagnostic Assistance based on the GAD-7
Obtaining Open-source Model Outputs for MDD Diagnostic Assistance based on the GAD-7.
...and 17 more sections

Figures (1)

Figure 1: Mental Health Diagnostic Assessment Questionnaires. The Patient Health Questionnaire (PHQ)-9 for depression assessment and the Generalized Anxiety Disorder (GAD)-7 for anxiety assessment.

Large Language Models for Mental Health Diagnostic Assessments: Exploring The Potential of Large Language Models for Assisting with Mental Health Diagnostic Assessments -- The Depression and Anxiety Case

TL;DR

Abstract

Large Language Models for Mental Health Diagnostic Assessments: Exploring The Potential of Large Language Models for Assisting with Mental Health Diagnostic Assessments -- The Depression and Anxiety Case

Authors

TL;DR

Abstract

Table of Contents

Figures (1)