Roleplay-doh: Enabling Domain-Experts to Create LLM-simulated Patients via Eliciting and Adhering to Principles
Ryan Louie, Ananjan Nandi, William Fang, Cheng Chang, Emma Brunskill, Diyi Yang
TL;DR
Roleplay-doh introduces a human-LLM collaboration framework where domain experts elicit qualitative feedback that is transformed into natural-language principles guiding an LLM-prompted AI patient for counselor training. The authors add a principle-adherence pipeline that decomposes complex principles into yes/no criteria and tests applicability to ensure reliable adherence, achieving substantial improvements in response quality and principle-following. In a study with 25 counseling experts and third-party judges, AI patients created through Roleplay-doh demonstrated higher authenticity and training readiness than scenario-only baselines, while the principle-adherence components reduced awkward dialogue and non-adherence. The work highlights a scalable approach for expert-guided simulations in sensitive domains and suggests broad applicability to other domain-specific roleplay scenarios, while acknowledging limitations of text-based interaction and ethical considerations.
Abstract
Recent works leverage LLMs to roleplay realistic social scenarios, aiding novices in practicing their social skills. However, simulating sensitive interactions, such as in mental health, is challenging. Privacy concerns restrict data access, and collecting expert feedback, although vital, is laborious. To address this, we develop Roleplay-doh, a novel human-LLM collaboration pipeline that elicits qualitative feedback from a domain-expert, which is transformed into a set of principles, or natural language rules, that govern an LLM-prompted roleplay. We apply this pipeline to enable senior mental health supporters to create customized AI patients for simulated practice partners for novice counselors. After uncovering issues in GPT-4 simulations not adhering to expert-defined principles, we also introduce a novel principle-adherence prompting pipeline which shows 30% improvements in response quality and principle following for the downstream task. Via a user study with 25 counseling experts, we demonstrate that the pipeline makes it easy and effective to create AI patients that more faithfully resemble real patients, as judged by creators and third-party counselors. See our project website at https://roleplay-doh.github.io/ for code and data.
