An LLM-based Simulation Framework for Embodied Conversational Agents in Psychological Counseling
Lixiu Wu, Yuanrong Tang, Qisen Pan, Xianyang Zhan, Yucheng Han, Lanxi Xiao, Tianhong Wang, Chen Zhong, Jiangtao Gong
TL;DR
This work tackles privacy-induced data scarcity in psychological counseling by presenting ECAs, an LLM-based embodied agent framework grounded in CBT and counseling theory to synthesize authentic client-counselor dialogues. It builds a rich embodied memory space by expanding real case data into structured memory representations and uses high-frequency counseling questions to drive data generation. The authors formulate six simulation principles, validate the approach on the D$^4$ dataset with licensed counselors and two automated evaluation methods, and release a public ECAs dataset. Results indicate ECAs produce higher authenticity, necessity, and sufficiency than baselines, demonstrating practical value for training and research in mental health counseling.
Abstract
Due to privacy concerns, open dialogue datasets for mental health are primarily generated through human or AI synthesis methods. However, the inherent implicit nature of psychological processes, particularly those of clients, poses challenges to the authenticity and diversity of synthetic data. In this paper, we propose ECAs (short for Embodied Conversational Agents), a framework for embodied agent simulation based on Large Language Models (LLMs) that incorporates multiple psychological theoretical principles.Using simulation, we expand real counseling case data into a nuanced embodied cognitive memory space and generate dialogue data based on high-frequency counseling questions.We validated our framework using the D4 dataset. First, we created a public ECAs dataset through batch simulations based on D4. Licensed counselors evaluated our method, demonstrating that it significantly outperforms baselines in simulation authenticity and necessity. Additionally, two LLM-based automated evaluation methods were employed to confirm the higher quality of the generated dialogues compared to the baselines. The source code and dataset are available at https://github.com/AIR-DISCOVER/ECAs-Dataset.
