System X: A Mobile Voice-Based AI System for EMR Generation and Clinical Decision Support in Low-Resource Maternal Healthcare
Maryam Mustafa, Umme Ammara, Amna Shahnawaz, Moaiz Abrar, Bakhtawar Ahtisham, Fozia Umber Qurashi, Mostafa Shahin, Beena Ahmed
TL;DR
System X presents a mobile, voice-based AI assistant enabling frontline maternal health workers in Pakistan to generate structured EMRs and real-time red-flag alerts from Urdu speech. It combines a fine-tuned Whisper ASR with a prompt-engineered GPT-4 LLM, supported by a medical dictionary and Retrieval-Augmented Generation to deliver consistent EMRs and actionable diagnostics. In a seven-month live deployment, the system produced over 500 EMRs, validated red flags, and achieved high EMR accuracy and usability despite infrastructural constraints, demonstrating practical feasibility. The work offers generalizable design principles for deploying voice-driven AI in linguistically and resource-constrained health systems and highlights the importance of structured outputs and clinician-in-the-loop validation.
Abstract
We present the design, implementation, and in-situ deployment of a smartphone-based voice-enabled AI system for generating electronic medical records (EMRs) and clinical risk alerts in maternal healthcare settings. Targeted at low-resource environments such as Pakistan, the system integrates a fine-tuned, multilingual automatic speech recognition (ASR) model and a prompt-engineered large language model (LLM) to enable healthcare workers to engage naturally in Urdu, their native language, regardless of literacy or technical background. Through speech-based input and localized understanding, the system generates structured EMRs and flags critical maternal health risks. Over a seven-month deployment in a not-for-profit hospital, the system supported the creation of over 500 EMRs and flagged over 300 potential clinical risks. We evaluate the system's performance across speech recognition accuracy, EMR field-level correctness, and clinical relevance of AI-generated red flags. Our results demonstrate that speech based AI interfaces, can be effectively adapted to real-world healthcare settings, especially in low-resource settings, when combined with structured input design, contextual medical dictionaries, and clinician-in-the-loop feedback loops. We discuss generalizable design principles for deploying voice-based mobile healthcare AI support systems in linguistically and infrastructurally constrained settings.
