Table of Contents
Fetching ...

Automated Multi-Label Annotation for Mental Health Illnesses Using Large Language Models

Abdelrahaman A. Hassan, Radwa J. Hanafy, Mohammed E. Fouda

TL;DR

The paper tackles the lack of multi-label mental health datasets in social-media data by introducing a zero-shot synthetic labeling workflow using large language models to convert single-label posts into multi-label annotations, culminating in the SPAADE-DR dataset. It systematically evaluates prompt strategies across multiple LLMs to label six mental health disorders and demonstrates the feasibility of scaling from 2 labels to 6 labels on RMHD-derived data. Key contributions include the DepSeverity-Dreaddit multi-label dataset, the SPAADE-DR corpus, and a comprehensive evaluation of single-label, multi-label, and unrestricted prompts, highlighting robust model-prompt pairings. The work significantly advances data-driven mental health diagnostics by enabling nuanced analyses of co-occurring disorders in social media, with potential impact on early detection and intervention strategies.

Abstract

The growing prevalence and complexity of mental health disorders present significant challenges for accurate diagnosis and treatment, particularly in understanding the interplay between co-occurring conditions. Mental health disorders, such as depression and Anxiety, often co-occur, yet current datasets derived from social media posts typically focus on single-disorder labels, limiting their utility in comprehensive diagnostic analyses. This paper addresses this critical gap by proposing a novel methodology for cleaning, sampling, labeling, and combining data to create versatile multi-label datasets. Our approach introduces a synthetic labeling technique to transform single-label datasets into multi-label annotations, capturing the complexity of overlapping mental health conditions. To achieve this, two single-label datasets are first merged into a foundational multi-label dataset, enabling realistic analyses of co-occurring diagnoses. We then design and evaluate various prompting strategies for large language models (LLMs), ranging from single-label predictions to unrestricted prompts capable of detecting any present disorders. After rigorously assessing multiple LLMs and prompt configurations, the optimal combinations are identified and applied to label six additional single-disorder datasets from RMHD. The result is SPAADE-DR, a robust, multi-label dataset encompassing diverse mental health conditions. This research demonstrates the transformative potential of LLM-driven synthetic labeling in advancing mental health diagnostics from social media data, paving the way for more nuanced, data-driven insights into mental health care.

Automated Multi-Label Annotation for Mental Health Illnesses Using Large Language Models

TL;DR

The paper tackles the lack of multi-label mental health datasets in social-media data by introducing a zero-shot synthetic labeling workflow using large language models to convert single-label posts into multi-label annotations, culminating in the SPAADE-DR dataset. It systematically evaluates prompt strategies across multiple LLMs to label six mental health disorders and demonstrates the feasibility of scaling from 2 labels to 6 labels on RMHD-derived data. Key contributions include the DepSeverity-Dreaddit multi-label dataset, the SPAADE-DR corpus, and a comprehensive evaluation of single-label, multi-label, and unrestricted prompts, highlighting robust model-prompt pairings. The work significantly advances data-driven mental health diagnostics by enabling nuanced analyses of co-occurring disorders in social media, with potential impact on early detection and intervention strategies.

Abstract

The growing prevalence and complexity of mental health disorders present significant challenges for accurate diagnosis and treatment, particularly in understanding the interplay between co-occurring conditions. Mental health disorders, such as depression and Anxiety, often co-occur, yet current datasets derived from social media posts typically focus on single-disorder labels, limiting their utility in comprehensive diagnostic analyses. This paper addresses this critical gap by proposing a novel methodology for cleaning, sampling, labeling, and combining data to create versatile multi-label datasets. Our approach introduces a synthetic labeling technique to transform single-label datasets into multi-label annotations, capturing the complexity of overlapping mental health conditions. To achieve this, two single-label datasets are first merged into a foundational multi-label dataset, enabling realistic analyses of co-occurring diagnoses. We then design and evaluate various prompting strategies for large language models (LLMs), ranging from single-label predictions to unrestricted prompts capable of detecting any present disorders. After rigorously assessing multiple LLMs and prompt configurations, the optimal combinations are identified and applied to label six additional single-disorder datasets from RMHD. The result is SPAADE-DR, a robust, multi-label dataset encompassing diverse mental health conditions. This research demonstrates the transformative potential of LLM-driven synthetic labeling in advancing mental health diagnostics from social media data, paving the way for more nuanced, data-driven insights into mental health care.

Paper Structure

This paper contains 27 sections, 2 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Process Workflow
  • Figure 2: Single-label Binary Prompt Template for Identifying Mental Health Conditions
  • Figure 3: Multi-Label Binary Prompt Templates for Identifying Mental Health Conditions
  • Figure 4: Unrestricted Binary Prompt Template for Identifying Mental Health Conditions
  • Figure 5: Workflow for the SPAADE-DR Dataset Process
  • ...and 2 more figures