SMILE: Single-turn to Multi-turn Inclusive Language Expansion via ChatGPT for Mental Health Support
Huachuan Qiu, Hongliang He, Shuai Zhang, Anqi Li, Zhenzhong Lan
TL;DR
<3-5 sentence high-level summary> SMILE tackles the lack of large-scale, diverse multi-turn mental health dialogues by converting public single-turn QAs into multi-turn conversations using ChatGPT prompts. The approach yields SMILECHAT, a 55k-scale Chinese dataset, and a downstream chatbot MeChat that benefits from parameter-efficient fine-tuning on ChatGLM2-6B. Through language transformation and diversity analyses, the authors demonstrate lifelike, diverse dialogue generation and validate quality with automatic metrics and human evaluation on PsyTest, a real-life anonymized dataset. The work provides public release of data, code, and model, and shows potential applicability to other domains beyond mental health.>
Abstract
Developing specialized dialogue systems for mental health support requires multi-turn conversation data, which has recently garnered increasing attention. However, gathering and releasing large-scale, real-life multi-turn conversations that could facilitate advancements in mental health support presents challenges in data privacy protection and the time and cost involved in crowdsourcing. To address these challenges, we introduce SMILE, a single-turn to multi-turn inclusive language expansion technique that prompts ChatGPT to rewrite public single-turn dialogues into multi-turn ones. Our work begins by analyzing language transformation and validating the feasibility of our proposed method. We conduct a study on dialogue diversity, including lexical features, semantic features, and dialogue topics, demonstrating the effectiveness of our method. Further, we employ our method to generate a large-scale, lifelike, and diverse dialogue dataset named SMILECHAT, consisting of 55k dialogues. Finally, we utilize the collected corpus to develop a mental health chatbot, MeChat. To better assess the quality of SMILECHAT, we collect a small-scale real-life counseling dataset conducted by data anonymization. Both automatic and human evaluations demonstrate significant improvements in our dialogue system and confirm that SMILECHAT is high-quality. Code, data, and model are publicly available at https://github.com/qiuhuachuan/smile.
