Table of Contents
Fetching ...

Psyche-R1: Towards Reliable Psychological LLMs through Unified Empathy, Expertise, and Reasoning

Chongyuan Dai, Jinpeng Hu, Hongchang Shi, Zhuo Li, Xun Yang, Meng Wang

TL;DR

The paper tackles the need for reliable psychological LLMs in Chinese by introducing Psyche-R1, a 7B model that unifies empathy, psychological expertise, and reasoning. It builds a large, high-quality training corpus through a two-track data synthesis pipeline: reasoning-focused PCQA with detailed rationales and 73k empathetic dialogues, plus a multi-LLM cross-selection process to identify challenging samples for RL. Training combines supervised fine-tuning on non-challenging data with Group Relative Policy Optimization on challenging samples, guided by a composite reward that enforces structured reasoning and accuracy. Experimental results show Psyche-R1 achieving strong performance on core psychological benchmarks, competitive with substantially larger models, and superior in combining empathetic dialogue with grounded reasoning, highlighting the practicality of a unified empathy-expertise-reasoning framework for mental health support.

Abstract

Amidst a shortage of qualified mental health professionals, the integration of large language models (LLMs) into psychological applications offers a promising way to alleviate the growing burden of mental health disorders. Recent reasoning-augmented LLMs have achieved remarkable performance in mathematics and programming, while research in the psychological domain has predominantly emphasized emotional support and empathetic dialogue, with limited attention to reasoning mechanisms that are beneficial to generating reliable responses. Therefore, in this paper, we propose Psyche-R1, the first Chinese psychological LLM that jointly integrates empathy, psychological expertise, and reasoning, built upon a novel data curation pipeline. Specifically, we design a comprehensive data synthesis pipeline that produces over 75k high-quality psychological questions paired with detailed rationales, generated through chain-of-thought (CoT) reasoning and iterative prompt-rationale optimization, along with 73k empathetic dialogues. Subsequently, we employ a hybrid training strategy wherein challenging samples are identified through a multi-LLM cross-selection strategy for group relative policy optimization (GRPO) to improve reasoning ability, while the remaining data is used for supervised fine-tuning (SFT) to enhance empathetic response generation and psychological domain knowledge. Extensive experiment results demonstrate the effectiveness of the Psyche-R1 across several psychological benchmarks, where our 7B Psyche-R1 achieves comparable results to 671B DeepSeek-R1.

Psyche-R1: Towards Reliable Psychological LLMs through Unified Empathy, Expertise, and Reasoning

TL;DR

The paper tackles the need for reliable psychological LLMs in Chinese by introducing Psyche-R1, a 7B model that unifies empathy, psychological expertise, and reasoning. It builds a large, high-quality training corpus through a two-track data synthesis pipeline: reasoning-focused PCQA with detailed rationales and 73k empathetic dialogues, plus a multi-LLM cross-selection process to identify challenging samples for RL. Training combines supervised fine-tuning on non-challenging data with Group Relative Policy Optimization on challenging samples, guided by a composite reward that enforces structured reasoning and accuracy. Experimental results show Psyche-R1 achieving strong performance on core psychological benchmarks, competitive with substantially larger models, and superior in combining empathetic dialogue with grounded reasoning, highlighting the practicality of a unified empathy-expertise-reasoning framework for mental health support.

Abstract

Amidst a shortage of qualified mental health professionals, the integration of large language models (LLMs) into psychological applications offers a promising way to alleviate the growing burden of mental health disorders. Recent reasoning-augmented LLMs have achieved remarkable performance in mathematics and programming, while research in the psychological domain has predominantly emphasized emotional support and empathetic dialogue, with limited attention to reasoning mechanisms that are beneficial to generating reliable responses. Therefore, in this paper, we propose Psyche-R1, the first Chinese psychological LLM that jointly integrates empathy, psychological expertise, and reasoning, built upon a novel data curation pipeline. Specifically, we design a comprehensive data synthesis pipeline that produces over 75k high-quality psychological questions paired with detailed rationales, generated through chain-of-thought (CoT) reasoning and iterative prompt-rationale optimization, along with 73k empathetic dialogues. Subsequently, we employ a hybrid training strategy wherein challenging samples are identified through a multi-LLM cross-selection strategy for group relative policy optimization (GRPO) to improve reasoning ability, while the remaining data is used for supervised fine-tuning (SFT) to enhance empathetic response generation and psychological domain knowledge. Extensive experiment results demonstrate the effectiveness of the Psyche-R1 across several psychological benchmarks, where our 7B Psyche-R1 achieves comparable results to 671B DeepSeek-R1.

Paper Structure

This paper contains 29 sections, 1 equation, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Comparison of different LLMs on the PCEB, plotted by average standard accuracy versus model size.
  • Figure 2: Overview of our proposed pipeline for constructing the dataset and Psyche-R1. Our pipeline involves generating psychological questions paired with detailed rationales, along with empathetic dialogues.
  • Figure 3: A Qualitative example from the CPsyExam test set comparing Psyche-R1 and Qwen2.5-72B-Instruct.