Psyche-R1: Towards Reliable Psychological LLMs through Unified Empathy, Expertise, and Reasoning
Chongyuan Dai, Jinpeng Hu, Hongchang Shi, Zhuo Li, Xun Yang, Meng Wang
TL;DR
The paper tackles the need for reliable psychological LLMs in Chinese by introducing Psyche-R1, a 7B model that unifies empathy, psychological expertise, and reasoning. It builds a large, high-quality training corpus through a two-track data synthesis pipeline: reasoning-focused PCQA with detailed rationales and 73k empathetic dialogues, plus a multi-LLM cross-selection process to identify challenging samples for RL. Training combines supervised fine-tuning on non-challenging data with Group Relative Policy Optimization on challenging samples, guided by a composite reward that enforces structured reasoning and accuracy. Experimental results show Psyche-R1 achieving strong performance on core psychological benchmarks, competitive with substantially larger models, and superior in combining empathetic dialogue with grounded reasoning, highlighting the practicality of a unified empathy-expertise-reasoning framework for mental health support.
Abstract
Amidst a shortage of qualified mental health professionals, the integration of large language models (LLMs) into psychological applications offers a promising way to alleviate the growing burden of mental health disorders. Recent reasoning-augmented LLMs have achieved remarkable performance in mathematics and programming, while research in the psychological domain has predominantly emphasized emotional support and empathetic dialogue, with limited attention to reasoning mechanisms that are beneficial to generating reliable responses. Therefore, in this paper, we propose Psyche-R1, the first Chinese psychological LLM that jointly integrates empathy, psychological expertise, and reasoning, built upon a novel data curation pipeline. Specifically, we design a comprehensive data synthesis pipeline that produces over 75k high-quality psychological questions paired with detailed rationales, generated through chain-of-thought (CoT) reasoning and iterative prompt-rationale optimization, along with 73k empathetic dialogues. Subsequently, we employ a hybrid training strategy wherein challenging samples are identified through a multi-LLM cross-selection strategy for group relative policy optimization (GRPO) to improve reasoning ability, while the remaining data is used for supervised fine-tuning (SFT) to enhance empathetic response generation and psychological domain knowledge. Extensive experiment results demonstrate the effectiveness of the Psyche-R1 across several psychological benchmarks, where our 7B Psyche-R1 achieves comparable results to 671B DeepSeek-R1.
