PSYCHE: A Multi-faceted Patient Simulation Framework for Evaluation of Psychiatric Assessment Conversational Agents
Jingoo Lee, Kyungho Lim, Young-Chul Jung, Byung-Hoon Kim
TL;DR
PSYCHE introduces a construct-grounded evaluation framework for psychiatric assessment conversational agents (PACAs) by simulating patients (PSYCHE-SP) through a multi-faceted construct (MFC) and scoring PACA performance against ground-truth constructs (Construct-SP) via the PSYCHE RUBRIC to yield the PSYCHE SCORE. The approach emphasizes clinical relevance, ethical safety, cost efficiency, and quantitative measurability, validated with 10 board-certified psychiatrists across seven disorders. Results show high conformity of PSYCHE-SP utterances (85–97%, average 93%) and a strong correlation between PSYCHE SCORE and expert scores (r = 0.8486), with moderate convergent validity to PIQSCA (r = 0.6367). The work demonstrates the framework’s robustness to weight settings, supports safer, scalable PACA benchmarking, and offers a pathway to extend construct-grounded evaluation to other psychiatric or medical assessment domains.
Abstract
Recent advances in large language models (LLMs) have accelerated the development of conversational agents capable of generating human-like responses. Since psychiatric assessments typically involve complex conversational interactions between psychiatrists and patients, there is growing interest in developing LLM-based psychiatric assessment conversational agents (PACAs) that aim to simulate the role of psychiatrists in clinical evaluations. However, standardized methods for benchmarking the clinical appropriateness of PACAs' interaction with patients still remain underexplored. Here, we propose PSYCHE, a novel framework designed to enable the 1) clinically relevant, 2) ethically safe, 3) cost-efficient, and 4) quantitative evaluation of PACAs. This is achieved by simulating psychiatric patients based on a multi-faceted psychiatric construct that defines the simulated patients' profiles, histories, and behaviors, which PACAs are expected to assess. We validate the effectiveness of PSYCHE through a study with 10 board-certified psychiatrists, supported by an in-depth analysis of the simulated patient utterances.
