Table of Contents
Fetching ...

WangLab at MEDIQA-CORR 2024: Optimized LLM-based Programs for Medical Error Detection and Correction

Augustin Toma, Ronald Xie, Steven Palayew, Patrick R. Lawler, Bo Wang

TL;DR

This work tackles the crucial problem of detecting and correcting medical errors in clinical notes through the MEDIQA-CORR 2024 shared task. It introduces two DSPy-based LLM programs: a retrieval-driven MS approach that leverages external QA data and a modular UW pipeline that sequentially detects, localizes, and corrects errors, with prompt optimization guiding each module. The approach achieves top performance across all three subtasks, reporting high Task 1 and Task 2 accuracies and a leading Task 3 aggregate score, while examining the impact of model choice and compilation. Limitations include focus on a subset of errors and generalizability challenges, motivating future work on broader error types, domain-specific fine-tuning, and richer evaluation frameworks to improve robustness in clinical settings.

Abstract

Medical errors in clinical text pose significant risks to patient safety. The MEDIQA-CORR 2024 shared task focuses on detecting and correcting these errors across three subtasks: identifying the presence of an error, extracting the erroneous sentence, and generating a corrected sentence. In this paper, we present our approach that achieved top performance in all three subtasks. For the MS dataset, which contains subtle errors, we developed a retrieval-based system leveraging external medical question-answering datasets. For the UW dataset, reflecting more realistic clinical notes, we created a pipeline of modules to detect, localize, and correct errors. Both approaches utilized the DSPy framework for optimizing prompts and few-shot examples in large language model (LLM) based programs. Our results demonstrate the effectiveness of LLM based programs for medical error correction. However, our approach has limitations in addressing the full diversity of potential errors in medical documentation. We discuss the implications of our work and highlight future research directions to advance the robustness and applicability of medical error detection and correction systems.

WangLab at MEDIQA-CORR 2024: Optimized LLM-based Programs for Medical Error Detection and Correction

TL;DR

This work tackles the crucial problem of detecting and correcting medical errors in clinical notes through the MEDIQA-CORR 2024 shared task. It introduces two DSPy-based LLM programs: a retrieval-driven MS approach that leverages external QA data and a modular UW pipeline that sequentially detects, localizes, and corrects errors, with prompt optimization guiding each module. The approach achieves top performance across all three subtasks, reporting high Task 1 and Task 2 accuracies and a leading Task 3 aggregate score, while examining the impact of model choice and compilation. Limitations include focus on a subset of errors and generalizability challenges, motivating future work on broader error types, domain-specific fine-tuning, and richer evaluation frameworks to improve robustness in clinical settings.

Abstract

Medical errors in clinical text pose significant risks to patient safety. The MEDIQA-CORR 2024 shared task focuses on detecting and correcting these errors across three subtasks: identifying the presence of an error, extracting the erroneous sentence, and generating a corrected sentence. In this paper, we present our approach that achieved top performance in all three subtasks. For the MS dataset, which contains subtle errors, we developed a retrieval-based system leveraging external medical question-answering datasets. For the UW dataset, reflecting more realistic clinical notes, we created a pipeline of modules to detect, localize, and correct errors. Both approaches utilized the DSPy framework for optimizing prompts and few-shot examples in large language model (LLM) based programs. Our results demonstrate the effectiveness of LLM based programs for medical error correction. However, our approach has limitations in addressing the full diversity of potential errors in medical documentation. We discuss the implications of our work and highlight future research directions to advance the robustness and applicability of medical error detection and correction systems.
Paper Structure (24 sections, 4 figures, 4 tables)

This paper contains 24 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Predicting the presence of an error through a comparison to the retrieved question
  • Figure 2: Identifying the error sentence
  • Figure 3: Generating the corrected sentence
  • Figure 4: Overview of the UW dataset pipeline, consisting of three main stages: error detection, error localization, and error correction. Each stage is implemented using a DSPy module optimized with the MIPRO teleprompter khattab2023dspy The pipeline also includes a quality control step based on the ROUGE-L score between the original erroneous text and the corrected version.