Table of Contents
Fetching ...

System X: A Mobile Voice-Based AI System for EMR Generation and Clinical Decision Support in Low-Resource Maternal Healthcare

Maryam Mustafa, Umme Ammara, Amna Shahnawaz, Moaiz Abrar, Bakhtawar Ahtisham, Fozia Umber Qurashi, Mostafa Shahin, Beena Ahmed

TL;DR

System X presents a mobile, voice-based AI assistant enabling frontline maternal health workers in Pakistan to generate structured EMRs and real-time red-flag alerts from Urdu speech. It combines a fine-tuned Whisper ASR with a prompt-engineered GPT-4 LLM, supported by a medical dictionary and Retrieval-Augmented Generation to deliver consistent EMRs and actionable diagnostics. In a seven-month live deployment, the system produced over 500 EMRs, validated red flags, and achieved high EMR accuracy and usability despite infrastructural constraints, demonstrating practical feasibility. The work offers generalizable design principles for deploying voice-driven AI in linguistically and resource-constrained health systems and highlights the importance of structured outputs and clinician-in-the-loop validation.

Abstract

We present the design, implementation, and in-situ deployment of a smartphone-based voice-enabled AI system for generating electronic medical records (EMRs) and clinical risk alerts in maternal healthcare settings. Targeted at low-resource environments such as Pakistan, the system integrates a fine-tuned, multilingual automatic speech recognition (ASR) model and a prompt-engineered large language model (LLM) to enable healthcare workers to engage naturally in Urdu, their native language, regardless of literacy or technical background. Through speech-based input and localized understanding, the system generates structured EMRs and flags critical maternal health risks. Over a seven-month deployment in a not-for-profit hospital, the system supported the creation of over 500 EMRs and flagged over 300 potential clinical risks. We evaluate the system's performance across speech recognition accuracy, EMR field-level correctness, and clinical relevance of AI-generated red flags. Our results demonstrate that speech based AI interfaces, can be effectively adapted to real-world healthcare settings, especially in low-resource settings, when combined with structured input design, contextual medical dictionaries, and clinician-in-the-loop feedback loops. We discuss generalizable design principles for deploying voice-based mobile healthcare AI support systems in linguistically and infrastructurally constrained settings.

System X: A Mobile Voice-Based AI System for EMR Generation and Clinical Decision Support in Low-Resource Maternal Healthcare

TL;DR

System X presents a mobile, voice-based AI assistant enabling frontline maternal health workers in Pakistan to generate structured EMRs and real-time red-flag alerts from Urdu speech. It combines a fine-tuned Whisper ASR with a prompt-engineered GPT-4 LLM, supported by a medical dictionary and Retrieval-Augmented Generation to deliver consistent EMRs and actionable diagnostics. In a seven-month live deployment, the system produced over 500 EMRs, validated red flags, and achieved high EMR accuracy and usability despite infrastructural constraints, demonstrating practical feasibility. The work offers generalizable design principles for deploying voice-driven AI in linguistically and resource-constrained health systems and highlights the importance of structured outputs and clinician-in-the-loop validation.

Abstract

We present the design, implementation, and in-situ deployment of a smartphone-based voice-enabled AI system for generating electronic medical records (EMRs) and clinical risk alerts in maternal healthcare settings. Targeted at low-resource environments such as Pakistan, the system integrates a fine-tuned, multilingual automatic speech recognition (ASR) model and a prompt-engineered large language model (LLM) to enable healthcare workers to engage naturally in Urdu, their native language, regardless of literacy or technical background. Through speech-based input and localized understanding, the system generates structured EMRs and flags critical maternal health risks. Over a seven-month deployment in a not-for-profit hospital, the system supported the creation of over 500 EMRs and flagged over 300 potential clinical risks. We evaluate the system's performance across speech recognition accuracy, EMR field-level correctness, and clinical relevance of AI-generated red flags. Our results demonstrate that speech based AI interfaces, can be effectively adapted to real-world healthcare settings, especially in low-resource settings, when combined with structured input design, contextual medical dictionaries, and clinician-in-the-loop feedback loops. We discuss generalizable design principles for deploying voice-based mobile healthcare AI support systems in linguistically and infrastructurally constrained settings.

Paper Structure

This paper contains 59 sections, 9 figures, 16 tables.

Figures (9)

  • Figure 1: An overview of our system
  • Figure 2: Expanded and scrollable "Medical History" screen shown in full for illustration. In the actual application, only a portion of the screen is visible at a time. Clinicians tap the microphone icon to record patient history in their preferred language.
  • Figure 3: Filled and scrollable "Medical History" screen displaying recorded allergies and past medical conditions. The combination of text fields and audio input reflects design decisions shaped by formative research and iterative testing.
  • Figure 4: Scrollable patient summary screen displaying vital signs and collapsible history categories. The MR No. at the top serves as the patient’s unique medical identifier.
  • Figure 5: Sample Clarification Questions Screen of a Patient's Proposed Plan
  • ...and 4 more figures