Enhancing Psychotherapy Counseling: A Data Augmentation Pipeline Leveraging Large Language Models for Counseling Conversations

Jun-Woo Kim; Ji-Eun Han; Jun-Seok Koh; Hyeon-Tae Seo; Du-Seong Chang

Enhancing Psychotherapy Counseling: A Data Augmentation Pipeline Leveraging Large Language Models for Counseling Conversations

Jun-Woo Kim, Ji-Eun Han, Jun-Seok Koh, Hyeon-Tae Seo, Du-Seong Chang

TL;DR

This work tackles the scarcity of high-quality multi-turn psychotherapy data by proposing a data augmentation pipeline that leverages LLMs to expand single-turn counseling conversations into multi-turn dialogues. It formalizes the task with $D_i={(x_i,y_i,m_i)}$ and $D_i'={(x_i^1,\dots,x_i^k),(y_i^1,\dots,y_i^k),m_i,c_i,t_i}$, and employs Information Extraction followed by four prompts (Description, Condition, Information, Answer) to generate realistic sessions that respect therapist and client details. An augmented dataset with Depression, Anxiety, Anger Management, and Trauma is created and evaluated via zero-shot and few-shot experiments using Llama-based baselines, with GPT-4o-based automatic scoring confirming that few-shot prompts substantially improve multi-turn dialogue quality. The results demonstrate the practical utility of leveraging expert-specific counseling styles for data augmentation, enabling better AI-assisted counseling systems, and the dataset is publicly released for replication and broader use.

Abstract

We introduce a pipeline that leverages Large Language Models (LLMs) to transform single-turn psychotherapy counseling sessions into multi-turn interactions. While AI-supported online counseling services for individuals with mental disorders exist, they are often constrained by the limited availability of multi-turn training datasets and frequently fail to fully utilize therapists' expertise. Our proposed pipeline effectively addresses these limitations. The pipeline comprises two main steps: 1) Information Extraction and 2) Multi-turn Counseling Generation. Each step is meticulously designed to extract and generate comprehensive multi-turn counseling conversations from the available datasets. Experimental results from both zero-shot and few-shot generation scenarios demonstrate that our approach significantly enhances the ability of LLMs to produce higher quality multi-turn dialogues in the context of mental health counseling. Our pipeline and dataset are publicly available https://github.com/jwkim-chat/A-Data-Augmentation-Pipeline-Leveraging-Large-Language-Models-for-Counseling-Conversations.

Enhancing Psychotherapy Counseling: A Data Augmentation Pipeline Leveraging Large Language Models for Counseling Conversations

TL;DR

and

, and employs Information Extraction followed by four prompts (Description, Condition, Information, Answer) to generate realistic sessions that respect therapist and client details. An augmented dataset with Depression, Anxiety, Anger Management, and Trauma is created and evaluated via zero-shot and few-shot experiments using Llama-based baselines, with GPT-4o-based automatic scoring confirming that few-shot prompts substantially improve multi-turn dialogue quality. The results demonstrate the practical utility of leveraging expert-specific counseling styles for data augmentation, enabling better AI-assisted counseling systems, and the dataset is publicly released for replication and broader use.

Abstract

Paper Structure (24 sections, 6 figures, 4 tables)

This paper contains 24 sections, 6 figures, 4 tables.

Introduction
Related Work
Preliminary
Task Definition
Source Dataset Pre-processing
Method
Information Extraction
Multi-turn Counseling Generation
Description Prompt
Condition Prompt
Information Prompt
Answer Prompt
Augmented Dataset
Experiment
Experiment Details
...and 9 more sections

Figures (6)

Figure 1: Overview of the proposed data augmentation pipeline.
Figure 2: Example of zero-shot prompt
Figure 3: Example of few-shot prompt
Figure 4: Example of evaluation prompt
Figure 5: Comparison of zero-shot and few-shot multi-turn counseling dialogue generation performance for Llama2-7B-chat and Llama3-70B-Instruct. In the few-shot setting, examples generated by our pipeline are used.
...and 1 more figures

Enhancing Psychotherapy Counseling: A Data Augmentation Pipeline Leveraging Large Language Models for Counseling Conversations

TL;DR

Abstract

Enhancing Psychotherapy Counseling: A Data Augmentation Pipeline Leveraging Large Language Models for Counseling Conversations

Authors

TL;DR

Abstract

Table of Contents

Figures (6)