Table of Contents
Fetching ...

Psy-Insight: Explainable Multi-turn Bilingual Dataset for Mental Health Counseling

Keqi Chen, Zekai Sun, Yuhua Wen, Huijun Lian, Yingming Gao, Ya Li

TL;DR

Psy-Insight introduces a bilingual, explainable multi-turn counseling dataset annotated with turn-level reasoning and session-level guidance to train empathetic mental health LLMs. By crawling real-world dialogues and mapping rich descriptive text to multi-task labels across emotion, psychotherapy, strategy, and topics, the dataset enables chain-of-thought reasoning and retrieved-argument generation. Evaluations—automatic and human—show that fine-tuning models on Psy-Insight improves generation quality and counseling-related reasoning, though still leaves a gap to professional therapists. The work provides a foundation for cross-cultural mental health AI, multi-task learning, and future enhancements via reinforcement learning with expert feedback.

Abstract

The in-context learning capabilities of large language models (LLMs) show great potential in mental health support. However, the lack of counseling datasets, particularly in Chinese corpora, restricts their application in this field. To address this, we constructed Psy-Insight, the first mental health-oriented explainable multi-task bilingual dataset. We collected face-to-face multi-turn counseling dialogues, which are annotated with multi-task labels and conversation process explanations. Our annotations include psychotherapy, emotion, strategy, and topic labels, as well as turn-level reasoning and session-level guidance. Psy-Insight is not only suitable for tasks such as label recognition but also meets the need for training LLMs to act as empathetic counselors through logical reasoning. Experiments show that training LLMs on Psy-Insight enables the models to not only mimic the conversation style but also understand the underlying strategies and reasoning of counseling.

Psy-Insight: Explainable Multi-turn Bilingual Dataset for Mental Health Counseling

TL;DR

Psy-Insight introduces a bilingual, explainable multi-turn counseling dataset annotated with turn-level reasoning and session-level guidance to train empathetic mental health LLMs. By crawling real-world dialogues and mapping rich descriptive text to multi-task labels across emotion, psychotherapy, strategy, and topics, the dataset enables chain-of-thought reasoning and retrieved-argument generation. Evaluations—automatic and human—show that fine-tuning models on Psy-Insight improves generation quality and counseling-related reasoning, though still leaves a gap to professional therapists. The work provides a foundation for cross-cultural mental health AI, multi-task learning, and future enhancements via reinforcement learning with expert feedback.

Abstract

The in-context learning capabilities of large language models (LLMs) show great potential in mental health support. However, the lack of counseling datasets, particularly in Chinese corpora, restricts their application in this field. To address this, we constructed Psy-Insight, the first mental health-oriented explainable multi-task bilingual dataset. We collected face-to-face multi-turn counseling dialogues, which are annotated with multi-task labels and conversation process explanations. Our annotations include psychotherapy, emotion, strategy, and topic labels, as well as turn-level reasoning and session-level guidance. Psy-Insight is not only suitable for tasks such as label recognition but also meets the need for training LLMs to act as empathetic counselors through logical reasoning. Experiments show that training LLMs on Psy-Insight enables the models to not only mimic the conversation style but also understand the underlying strategies and reasoning of counseling.

Paper Structure

This paper contains 27 sections, 2 equations, 10 figures, 17 tables.

Figures (10)

  • Figure 1: The left section presents Psy-Insight's counseling dialogues and annotations, while the right section illustrates the corresponding multi-tasks for these annotations. The Psy-Insight dataset features 951 sessions of multi-turn counseling dialogues annotated with step-by-step reasoning and multi-task labels. Within a session, the therapist and client engage in 56̃0 turns of dialogue on a single topic. We have annotated counseling dialogues at various granularity levels. The example of Chinese data is shown in Table \ref{['tab:chinese_example']}
  • Figure 2: Comparison of the construction processes of previous datasets and Psy-Insight dataset. As shown on the left side, previous datasets primarily focused on annotating short labels, which are suitable for the subtask in pipeline. The Psy-Insight dataset shown on the right side emphasizes the interpretability of the dialogue process, with step-by-step reasoning and session-level guide and explanation. We also collect multi-tasks labels based on each dialog sessions, offering data to enhance LLM's generalization ability.
  • Figure 3: The construction workflow of Psy-Insight Dataset. Our workflow involves 4 steps: (1) Locating dialogues and explanation with psychotherapy keywords; (2) Data cleaning; (3) Mapping dialogues and explanation with sliding window algorithm, and computing similarity with embedding models and LLMs; (4) Checking annotations by human annotators.
  • Figure 4: Statistics of topics in counseling of Psy-Insight. Top-3 topics are Depression (21.7%), Partner Relationship (21.3%), Child-Parent Relationship (18.9%).
  • Figure 5: Word cloud figures of Chinese and English counseling.
  • ...and 5 more figures