Psy-Insight: Explainable Multi-turn Bilingual Dataset for Mental Health Counseling
Keqi Chen, Zekai Sun, Yuhua Wen, Huijun Lian, Yingming Gao, Ya Li
TL;DR
Psy-Insight introduces a bilingual, explainable multi-turn counseling dataset annotated with turn-level reasoning and session-level guidance to train empathetic mental health LLMs. By crawling real-world dialogues and mapping rich descriptive text to multi-task labels across emotion, psychotherapy, strategy, and topics, the dataset enables chain-of-thought reasoning and retrieved-argument generation. Evaluations—automatic and human—show that fine-tuning models on Psy-Insight improves generation quality and counseling-related reasoning, though still leaves a gap to professional therapists. The work provides a foundation for cross-cultural mental health AI, multi-task learning, and future enhancements via reinforcement learning with expert feedback.
Abstract
The in-context learning capabilities of large language models (LLMs) show great potential in mental health support. However, the lack of counseling datasets, particularly in Chinese corpora, restricts their application in this field. To address this, we constructed Psy-Insight, the first mental health-oriented explainable multi-task bilingual dataset. We collected face-to-face multi-turn counseling dialogues, which are annotated with multi-task labels and conversation process explanations. Our annotations include psychotherapy, emotion, strategy, and topic labels, as well as turn-level reasoning and session-level guidance. Psy-Insight is not only suitable for tasks such as label recognition but also meets the need for training LLMs to act as empathetic counselors through logical reasoning. Experiments show that training LLMs on Psy-Insight enables the models to not only mimic the conversation style but also understand the underlying strategies and reasoning of counseling.
