Enhancing Psychotherapy Counseling: A Data Augmentation Pipeline Leveraging Large Language Models for Counseling Conversations
Jun-Woo Kim, Ji-Eun Han, Jun-Seok Koh, Hyeon-Tae Seo, Du-Seong Chang
TL;DR
This work tackles the scarcity of high-quality multi-turn psychotherapy data by proposing a data augmentation pipeline that leverages LLMs to expand single-turn counseling conversations into multi-turn dialogues. It formalizes the task with $D_i={(x_i,y_i,m_i)}$ and $D_i'={(x_i^1,\dots,x_i^k),(y_i^1,\dots,y_i^k),m_i,c_i,t_i}$, and employs Information Extraction followed by four prompts (Description, Condition, Information, Answer) to generate realistic sessions that respect therapist and client details. An augmented dataset with Depression, Anxiety, Anger Management, and Trauma is created and evaluated via zero-shot and few-shot experiments using Llama-based baselines, with GPT-4o-based automatic scoring confirming that few-shot prompts substantially improve multi-turn dialogue quality. The results demonstrate the practical utility of leveraging expert-specific counseling styles for data augmentation, enabling better AI-assisted counseling systems, and the dataset is publicly released for replication and broader use.
Abstract
We introduce a pipeline that leverages Large Language Models (LLMs) to transform single-turn psychotherapy counseling sessions into multi-turn interactions. While AI-supported online counseling services for individuals with mental disorders exist, they are often constrained by the limited availability of multi-turn training datasets and frequently fail to fully utilize therapists' expertise. Our proposed pipeline effectively addresses these limitations. The pipeline comprises two main steps: 1) Information Extraction and 2) Multi-turn Counseling Generation. Each step is meticulously designed to extract and generate comprehensive multi-turn counseling conversations from the available datasets. Experimental results from both zero-shot and few-shot generation scenarios demonstrate that our approach significantly enhances the ability of LLMs to produce higher quality multi-turn dialogues in the context of mental health counseling. Our pipeline and dataset are publicly available https://github.com/jwkim-chat/A-Data-Augmentation-Pipeline-Leveraging-Large-Language-Models-for-Counseling-Conversations.
