Table of Contents
Fetching ...

An LLM-based Simulation Framework for Embodied Conversational Agents in Psychological Counseling

Lixiu Wu, Yuanrong Tang, Qisen Pan, Xianyang Zhan, Yucheng Han, Lanxi Xiao, Tianhong Wang, Chen Zhong, Jiangtao Gong

TL;DR

This work tackles privacy-induced data scarcity in psychological counseling by presenting ECAs, an LLM-based embodied agent framework grounded in CBT and counseling theory to synthesize authentic client-counselor dialogues. It builds a rich embodied memory space by expanding real case data into structured memory representations and uses high-frequency counseling questions to drive data generation. The authors formulate six simulation principles, validate the approach on the D$^4$ dataset with licensed counselors and two automated evaluation methods, and release a public ECAs dataset. Results indicate ECAs produce higher authenticity, necessity, and sufficiency than baselines, demonstrating practical value for training and research in mental health counseling.

Abstract

Due to privacy concerns, open dialogue datasets for mental health are primarily generated through human or AI synthesis methods. However, the inherent implicit nature of psychological processes, particularly those of clients, poses challenges to the authenticity and diversity of synthetic data. In this paper, we propose ECAs (short for Embodied Conversational Agents), a framework for embodied agent simulation based on Large Language Models (LLMs) that incorporates multiple psychological theoretical principles.Using simulation, we expand real counseling case data into a nuanced embodied cognitive memory space and generate dialogue data based on high-frequency counseling questions.We validated our framework using the D4 dataset. First, we created a public ECAs dataset through batch simulations based on D4. Licensed counselors evaluated our method, demonstrating that it significantly outperforms baselines in simulation authenticity and necessity. Additionally, two LLM-based automated evaluation methods were employed to confirm the higher quality of the generated dialogues compared to the baselines. The source code and dataset are available at https://github.com/AIR-DISCOVER/ECAs-Dataset.

An LLM-based Simulation Framework for Embodied Conversational Agents in Psychological Counseling

TL;DR

This work tackles privacy-induced data scarcity in psychological counseling by presenting ECAs, an LLM-based embodied agent framework grounded in CBT and counseling theory to synthesize authentic client-counselor dialogues. It builds a rich embodied memory space by expanding real case data into structured memory representations and uses high-frequency counseling questions to drive data generation. The authors formulate six simulation principles, validate the approach on the D dataset with licensed counselors and two automated evaluation methods, and release a public ECAs dataset. Results indicate ECAs produce higher authenticity, necessity, and sufficiency than baselines, demonstrating practical value for training and research in mental health counseling.

Abstract

Due to privacy concerns, open dialogue datasets for mental health are primarily generated through human or AI synthesis methods. However, the inherent implicit nature of psychological processes, particularly those of clients, poses challenges to the authenticity and diversity of synthetic data. In this paper, we propose ECAs (short for Embodied Conversational Agents), a framework for embodied agent simulation based on Large Language Models (LLMs) that incorporates multiple psychological theoretical principles.Using simulation, we expand real counseling case data into a nuanced embodied cognitive memory space and generate dialogue data based on high-frequency counseling questions.We validated our framework using the D4 dataset. First, we created a public ECAs dataset through batch simulations based on D4. Licensed counselors evaluated our method, demonstrating that it significantly outperforms baselines in simulation authenticity and necessity. Additionally, two LLM-based automated evaluation methods were employed to confirm the higher quality of the generated dialogues compared to the baselines. The source code and dataset are available at https://github.com/AIR-DISCOVER/ECAs-Dataset.

Paper Structure

This paper contains 19 sections, 6 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: ECAs Framework Overview. The process consists of three steps: Step 1, extracting base information for the Client Agent from real datasets; Step 2, expanding the agent's two profiles during memory simulation to form a complete Client Profile, and generating a embodied memory space, including beliefs, cognitive processes, and memories, based on this comprehensive profile; Step 3, dynamically retrieving context-relevant memories during real conversations to ensure realism and consistency.
  • Figure 2: Client Personal Profile. The persona describes the Client Agent's basic information such as name, personality, and appearance, along with background and past experiences to form a complete psychological trajectory.
  • Figure 3: Client Social Profile. The evolution of the Client Agent's social network and relationships over time is reflected, reinforcing the consistency between the social interaction memories and personal profile.
  • Figure 4: Classification of Positive and Negative Expert Comments. Expert comments are categorized as positive (P_) or negative (N_) based on four dimensions, and grouped into ECAs (Ours), GPT-4o, and D$^4$.
  • Figure 5: Comparison of ECAs (Ours), GPT-4o, and D$^4$ Performance Across Four Dimensions. Box plots show the distribution of scores for Necessity, Sufficiency, Fidelity, and Consistency.