Overview of the MEDIQA-OE 2025 Shared Task on Medical Order Extraction from Doctor-Patient Consultations
Jean-Philippe Corbeil, Asma Ben Abacha, Jerome Tremblay, Phillip Swazinna, Akila Jeeson Daniel, Miguel Del-Agua, Francois Beaulieu
TL;DR
The paper introduces MEDIQA-OE 2025, the first shared task focused on extracting structured medical orders from doctor-patient conversations to populate EHRs, addressing long, multi-speaker dialogues and mixed-output fields. It benchmarks prompting-based approaches across closed- and open-weight LLMs using datasets ACI-Bench and PriMock57, with evaluation on four fields (description, order_type, reason, provenance) and a composite leaderboard metric. Top results come from GPT-4 with constrained decoding, while open-weight models show a strong size-performance correlation, highlighting the value of model scale in few-shot settings. The study identifies remaining gaps in description and provenance, emphasizes Dataset size as a limiting factor, and suggests future work in data augmentation, finetuning, and hybrid prompting strategies to further reduce documentation burden and improve EHR accuracy.
Abstract
Clinical documentation increasingly uses automatic speech recognition and summarization, yet converting conversations into actionable medical orders for Electronic Health Records remains unexplored. A solution to this problem can significantly reduce the documentation burden of clinicians and directly impact downstream patient care. We introduce the MEDIQA-OE 2025 shared task, the first challenge on extracting medical orders from doctor-patient conversations. Six teams participated in the shared task and experimented with a broad range of approaches, and both closed- and open-weight large language models (LLMs). In this paper, we describe the MEDIQA-OE task, dataset, final leaderboard ranking, and participants' solutions.
