Conversational Medical AI: Ready for Practice
Antoine Lizée, Pierre-Auguste Beaucoté, James Whitbeck, Marion Doumeingts, Anaël Beaugnon, Isabelle Feldhaus
TL;DR
This study investigates Mo, a physician-supervised LLM-based conversational agent integrated into Alan’s real-world medical chat service, addressing the urgent issue of physician shortages. Using a randomized controlled design (n≈926 eligible conversations), the authors demonstrate that AI-assisted conversations yield higher information clarity and overall satisfaction without compromising trust or perceived empathy, with strong safety oversight evidenced by GP evaluations. Mo’s development relies on a multi-agent framework and a rigorous offline evaluation pipeline, including a French medical knowledge benchmark, real-world anonymized chats, and simulated patient dialogues to optimize performance and end-to-end dialogue capabilities. The findings suggest AI augmentation can enhance patient experience while preserving safety, offering practical guidance for implementing AI in healthcare communications and informing future research on long-term outcomes, system integration, and privacy protections.
Abstract
The shortage of doctors is creating a critical squeeze in access to medical expertise. While conversational Artificial Intelligence (AI) holds promise in addressing this problem, its safe deployment in patient-facing roles remains largely unexplored in real-world medical settings. We present the first large-scale evaluation of a physician-supervised LLM-based conversational agent in a real-world medical setting. Our agent, Mo, was integrated into an existing medical advice chat service. Over a three-week period, we conducted a randomized controlled experiment with 926 cases to evaluate patient experience and satisfaction. Among these, Mo handled 298 complete patient interactions, for which we report physician-assessed measures of safety and medical accuracy. Patients reported higher clarity of information (3.73 vs 3.62 out of 4, p < 0.05) and overall satisfaction (4.58 vs 4.42 out of 5, p < 0.05) with AI-assisted conversations compared to standard care, while showing equivalent levels of trust and perceived empathy. The high opt-in rate (81% among respondents) exceeded previous benchmarks for AI acceptance in healthcare. Physician oversight ensured safety, with 95% of conversations rated as "good" or "excellent" by general practitioners experienced in operating a medical advice chat service. Our findings demonstrate that carefully implemented AI medical assistants can enhance patient experience while maintaining safety standards through physician supervision. This work provides empirical evidence for the feasibility of AI deployment in healthcare communication and insights into the requirements for successful integration into existing healthcare services.
