Table of Contents
Fetching ...

Beyond Empathy: Integrating Diagnostic and Therapeutic Reasoning with Large Language Models for Mental Health Counseling

He Hu, Yucheng Zhou, Juzheng Si, Qianning Wang, Hengheng Zhang, Fuji Ren, Fei Ma, Laizhong Cui, Qi Tian

TL;DR

The study tackles the gap in diagnostic grounding and therapeutic diversity in mental health LLMs by introducing PsyLLM, a model that integrates DSM/ICD-aligned diagnostic reasoning with multiple therapies (e.g., CBT, ACT, psychodynamic). It builds OpenR1-Psy, a large, multi-turn dialogue dataset synthesized from Reddit posts and real counseling data, with explicit reasoning traces $R$ guiding counselor responses $U_C$ under guidelines $D$ and $T$, and validates data quality via multi-dimensional filtering. Through supervised fine-tuning on OpenR1-Psy, PsyLLM achieves superior performance across automatic and human evaluations on four counseling dimensions: Empathy & Insight, Support & Autonomy, Attunement & Presence, and Safety & Boundaries, compared to baselines and ablations. The work demonstrates the importance of integrating diagnostic reasoning and therapeutic diversity for clinically relevant AI counseling and provides open access to the OpenR1-Psy dataset for further research and development.

Abstract

Large language models (LLMs) hold significant potential for mental health support, capable of generating empathetic responses and simulating therapeutic conversations. However, existing LLM-based approaches often lack the clinical grounding necessary for real-world psychological counseling, particularly in explicit diagnostic reasoning aligned with standards like the DSM/ICD and incorporating diverse therapeutic modalities beyond basic empathy or single strategies. To address these critical limitations, we propose PsyLLM, the first large language model designed to systematically integrate both diagnostic and therapeutic reasoning for mental health counseling. To develop PsyLLM, we design a novel automated data synthesis pipeline that processes real-world mental health posts collected from Reddit, where users frequently share psychological distress and seek community support. This pipeline processes real-world mental health posts, generates multi-turn dialogue structures, and leverages LLMs guided by international diagnostic standards (e.g., DSM/ICD) and multiple therapeutic frameworks (e.g., CBT, ACT, psychodynamic) to simulate detailed clinical reasoning processes. Rigorous multi-dimensional filtering ensures the generation of high-quality, clinically aligned dialogue data. In addition, we introduce a new benchmark and evaluation protocol, assessing counseling quality across four key dimensions. Our experiments demonstrate that PsyLLM significantly outperforms state-of-the-art baseline models on this benchmark. The model weights and dataset have been publicly released at https://github.com/Emo-gml/PsyLLM.

Beyond Empathy: Integrating Diagnostic and Therapeutic Reasoning with Large Language Models for Mental Health Counseling

TL;DR

The study tackles the gap in diagnostic grounding and therapeutic diversity in mental health LLMs by introducing PsyLLM, a model that integrates DSM/ICD-aligned diagnostic reasoning with multiple therapies (e.g., CBT, ACT, psychodynamic). It builds OpenR1-Psy, a large, multi-turn dialogue dataset synthesized from Reddit posts and real counseling data, with explicit reasoning traces guiding counselor responses under guidelines and , and validates data quality via multi-dimensional filtering. Through supervised fine-tuning on OpenR1-Psy, PsyLLM achieves superior performance across automatic and human evaluations on four counseling dimensions: Empathy & Insight, Support & Autonomy, Attunement & Presence, and Safety & Boundaries, compared to baselines and ablations. The work demonstrates the importance of integrating diagnostic reasoning and therapeutic diversity for clinically relevant AI counseling and provides open access to the OpenR1-Psy dataset for further research and development.

Abstract

Large language models (LLMs) hold significant potential for mental health support, capable of generating empathetic responses and simulating therapeutic conversations. However, existing LLM-based approaches often lack the clinical grounding necessary for real-world psychological counseling, particularly in explicit diagnostic reasoning aligned with standards like the DSM/ICD and incorporating diverse therapeutic modalities beyond basic empathy or single strategies. To address these critical limitations, we propose PsyLLM, the first large language model designed to systematically integrate both diagnostic and therapeutic reasoning for mental health counseling. To develop PsyLLM, we design a novel automated data synthesis pipeline that processes real-world mental health posts collected from Reddit, where users frequently share psychological distress and seek community support. This pipeline processes real-world mental health posts, generates multi-turn dialogue structures, and leverages LLMs guided by international diagnostic standards (e.g., DSM/ICD) and multiple therapeutic frameworks (e.g., CBT, ACT, psychodynamic) to simulate detailed clinical reasoning processes. Rigorous multi-dimensional filtering ensures the generation of high-quality, clinically aligned dialogue data. In addition, we introduce a new benchmark and evaluation protocol, assessing counseling quality across four key dimensions. Our experiments demonstrate that PsyLLM significantly outperforms state-of-the-art baseline models on this benchmark. The model weights and dataset have been publicly released at https://github.com/Emo-gml/PsyLLM.

Paper Structure

This paper contains 43 sections, 3 equations, 23 figures, 12 tables.

Figures (23)

  • Figure 1: PsyLLM simulates therapeutic reasoning by assessing emotions, analyzing cognitive patterns, and formulating strategies grounded in DSM/ICD criteria and diverse modalities (e.g., CBT, ACT, psychodynamic). This enables clinically informed, context-sensitive counseling responses.
  • Figure 2: Overview of the OpenR1-Psy dataset construction pipeline. The process includes five stages: (1) Data collection from Reddit and real counseling datasets. (2) Parsing and interaction planning using a language model to assess emotions, define dialogue rounds, and set therapeutic themes. (3) Extraction of patient utterances from simulated dialogues and real counseling data. (4) Generation of reasoning traces and counselor responses based on diagnostic standards and therapeutic strategies. (5) Multi-dimensional validation to ensure coherence, clinical relevance, and reasoning quality.
  • Figure 3: Analysis of OpenR1-Psy Dataset: (Left) Distribution of Psychotherapy Approaches; (Middle) Distribution of Scene Categories; (Right) Distribution of Severity Levels.
  • Figure 4: Average performance of topic dimensions. Each axis represents a topic, and the radius indicates its mean score on the associated metric, highlighting relative strengths and weaknesses.
  • Figure 5: Comparison of different reasoning methods, including In-Context Learning and Two-Phase Prompting.
  • ...and 18 more figures