IryoNLP at MEDIQA-CORR 2024: Tackling the Medical Error Detection & Correction Task On the Shoulders of Medical Agents

Jean-Philippe Corbeil

IryoNLP at MEDIQA-CORR 2024: Tackling the Medical Error Detection & Correction Task On the Shoulders of Medical Agents

Jean-Philippe Corbeil

TL;DR

The paper tackles medical error detection and correction in clinical notes under data scarcity by proposing a multi‑agent, retrieval‑augmented framework (MedReAct'N'MedReFlex) that orchestrates four LLM agents to observe, search, reflect, and format corrections. Built on a RAG backbone with ClinicalCorp and a newly released MedWiki, the approach leverages five GPT‑4 evaluators to quality‑gate proposed fixes before JSON formatting. Key contributions include the fixed‑schema multi‑agent design, open‑source clinical corpora (MedWiki, ClinicalCorp) and datasets, and extensive analysis of retrieval and evaluation thresholds that improve performance over single‑agent baselines. The work demonstrates the practical impact of combining structured agentic reasoning with robust retrieval for knowledge‑intensive clinical NLP tasks, achieving competitive ranking on MEDIQA‑CORR 2024 and providing reusable resources for future research.

Abstract

In natural language processing applied to the clinical domain, utilizing large language models has emerged as a promising avenue for error detection and correction on clinical notes, a knowledge-intensive task for which annotated data is scarce. This paper presents MedReAct'N'MedReFlex, which leverages a suite of four LLM-based medical agents. The MedReAct agent initiates the process by observing, analyzing, and taking action, generating trajectories to guide the search to target a potential error in the clinical notes. Subsequently, the MedEval agent employs five evaluators to assess the targeted error and the proposed correction. In cases where MedReAct's actions prove insufficient, the MedReFlex agent intervenes, engaging in reflective analysis and proposing alternative strategies. Finally, the MedFinalParser agent formats the final output, preserving the original style while ensuring the integrity of the error correction process. One core component of our method is our RAG pipeline based on our ClinicalCorp corpora. Among other well-known sources containing clinical guidelines and information, we preprocess and release the open-source MedWiki dataset for clinical RAG application. Our results demonstrate the central role of our RAG approach with ClinicalCorp leveraged through the MedReAct'N'MedReFlex framework. It achieved the ninth rank on the MEDIQA-CORR 2024 final leaderboard.

IryoNLP at MEDIQA-CORR 2024: Tackling the Medical Error Detection & Correction Task On the Shoulders of Medical Agents

TL;DR

Abstract

Paper Structure (26 sections, 8 figures, 2 tables)

This paper contains 26 sections, 8 figures, 2 tables.

Introduction
Related Work
Medical Large Language Models
Agentic Methods
Retrieval-Augmented Generation
Methodology
MEDIQA-CORR Task
ClinicalCorp Corpora
guidelines
MedCorp
MedWiki
Semantic Search
MedReAct'N'MedReFlex Framework
MedReAct Agent
MedEval Agent
...and 11 more sections

Figures (8)

Figure 1: Schema of MedReAct'N'MedReFlex along the context of the clinical error correction task accessible to all medical agents: MedReAct, MedReFlex, MedEval and MedFinalParser. A) The MedReAct agent first provides an observation, a thought and an action. B) In the case of a search action, it triggers a semantic search over ClinicalCorp using MedReAct's query. Then, the MedReAct agent loops up to $N$ times (green inner loop) or until a final_mistake action is provided. C) After $N$ unsuccessful searches from MedReAct, the MedReFlex agent reflects on the current situation and suggests a solution (pink outer loop). Then, MedReAct might start again. D) Once MedReAct selects the final_mistake action, the five MedEval agents review the answer and give a score between 1 and 5 (blue line). E) If the average equals or surpasses 3.8 and the minimum above or equal to 3, the MedFinalParser agent formats the final answer into a JSON object. If the answer is unsatisfactory, MedReFlex is triggered instead. If MedReFlex reaches unsuccessfully the $M^{th}$ turns, MedFinalParser concludes that there is no error.
Figure 2: Performances across many retrieval top-k values with a reranking top-k set at 20 over 3 runs.
Figure 3: ReAct step average latency per retrieval top-k with a reranking top-k set at 20.
Figure 4: Average turns of MedReAct and MedReFlex according to various retrieval top-k with a reranking top-k set at 20.
Figure 5: Reranker top-K with a retrieval top-k set at 300.
...and 3 more figures

IryoNLP at MEDIQA-CORR 2024: Tackling the Medical Error Detection & Correction Task On the Shoulders of Medical Agents

TL;DR

Abstract

IryoNLP at MEDIQA-CORR 2024: Tackling the Medical Error Detection & Correction Task On the Shoulders of Medical Agents

Authors

TL;DR

Abstract

Table of Contents

Figures (8)