Evaluating Prompting Strategies with MedGemma for Medical Order Extraction
Abhinand Balachandran, Bavana Durgapraveen, Gowsikkan Sikkan Sudhagar, Vidhya Varshany J S, Sriram Rajkumar
TL;DR
This work tackles the problem of extracting structured medical orders from doctor-patient conversations to reduce documentation burden and improve patient safety. It systematically compares three prompting paradigms (1-Shot, ReAct, and Agentic Workflow) using MedGemma models (4B and 27B) on the MEDIQA-OE 2025 SIMORD-derived data. The findings show that, for manually annotated transcripts, a simple 1-Shot prompt consistently outperforms more complex reasoning frameworks, with larger models offering additional gains. The study highlights the importance of data characteristics in prompting strategy selection and suggests validating these approaches on noisier, real-world transcripts in future work. The results provide actionable guidance for clinical information extraction under different data conditions and prompting settings.
Abstract
The accurate extraction of medical orders from doctor-patient conversations is a critical task for reducing clinical documentation burdens and ensuring patient safety. This paper details our team submission to the MEDIQA-OE-2025 Shared Task. We investigate the performance of MedGemma, a new domain-specific open-source language model, for structured order extraction. We systematically evaluate three distinct prompting paradigms: a straightforward one-Shot approach, a reasoning-focused ReAct framework, and a multi-step agentic workflow. Our experiments reveal that while more complex frameworks like ReAct and agentic flows are powerful, the simpler one-shot prompting method achieved the highest performance on the official validation set. We posit that on manually annotated transcripts, complex reasoning chains can lead to "overthinking" and introduce noise, making a direct approach more robust and efficient. Our work provides valuable insights into selecting appropriate prompting strategies for clinical information extraction in varied data conditions.
