Table of Contents
Fetching ...

Few shot chain-of-thought driven reasoning to prompt LLMs for open ended medical question answering

Saeel Sandeep Nachane, Ojas Gramopadhye, Prateek Chanda, Ganesh Ramakrishnan, Kshitij Sharad Jadhav, Yatin Nandwani, Dinesh Raghu, Sachindra Joshi

TL;DR

This work addresses open-ended medical question answering by proposing MEDQA-OPEN and a chain-of-thought prompting framework (CLINICR) that mirrors incremental clinical reasoning. It compares two prompting strategies (Eliminative and ClinicR) and introduces a forward-backward workflow augmented by a reward-model verifier to improve answer reliability. Experiments show ClinicR generally outperforms Eliminative in open-ended settings, especially when paired with Verifier-based selection, achieving high expert-agreement on MedQA-Open and ClinicianCases. The approach advances clinically plausible, verifiable LLM reasoning and outlines paths for broader generalization and integration with retrieval and knowledge-grounding techniques.

Abstract

In this paper, we propose a modified version of the MedQA-USMLE dataset, named MEDQA-OPEN, which contains open-ended medical questions without options to mimic clinical scenarios, along with clinician-approved reasoned answers. Additionally, we implement a prompt driven by Chain of Thought (CoT) reasoning, CLINICR, to mirror the prospective process of incremental reasoning, reaching a correct response to medical questions. We empirically demonstrate how CLINICR outperforms the state-of-the-art 5-shot CoT-based prompt (Liévin et al., 2022). We also present an approach that mirrors real-life clinical practice by first exploring multiple differential diagnoses through MCQ-CLINICR and subsequently narrowing down to a final diagnosis using MCQ-ELIMINATIVE. Finally, emphasizing the importance of response verification in medical settings, we utilize a reward model mechanism, replacing the elimination process performed by MCQ-ELIMINATIVE.

Few shot chain-of-thought driven reasoning to prompt LLMs for open ended medical question answering

TL;DR

This work addresses open-ended medical question answering by proposing MEDQA-OPEN and a chain-of-thought prompting framework (CLINICR) that mirrors incremental clinical reasoning. It compares two prompting strategies (Eliminative and ClinicR) and introduces a forward-backward workflow augmented by a reward-model verifier to improve answer reliability. Experiments show ClinicR generally outperforms Eliminative in open-ended settings, especially when paired with Verifier-based selection, achieving high expert-agreement on MedQA-Open and ClinicianCases. The approach advances clinically plausible, verifiable LLM reasoning and outlines paths for broader generalization and integration with retrieval and knowledge-grounding techniques.

Abstract

In this paper, we propose a modified version of the MedQA-USMLE dataset, named MEDQA-OPEN, which contains open-ended medical questions without options to mimic clinical scenarios, along with clinician-approved reasoned answers. Additionally, we implement a prompt driven by Chain of Thought (CoT) reasoning, CLINICR, to mirror the prospective process of incremental reasoning, reaching a correct response to medical questions. We empirically demonstrate how CLINICR outperforms the state-of-the-art 5-shot CoT-based prompt (Liévin et al., 2022). We also present an approach that mirrors real-life clinical practice by first exploring multiple differential diagnoses through MCQ-CLINICR and subsequently narrowing down to a final diagnosis using MCQ-ELIMINATIVE. Finally, emphasizing the importance of response verification in medical settings, we utilize a reward model mechanism, replacing the elimination process performed by MCQ-ELIMINATIVE.
Paper Structure (41 sections, 10 figures, 24 tables)

This paper contains 41 sections, 10 figures, 24 tables.

Figures (10)

  • Figure 1: An overview of the two prompting strategies used for predicting the answer for both the MCQ version (MedQA-MCQ) and open-ended version (MedQA-Open) of the MedQA dataset.
  • Figure 2: MCQ-Eliminative contains an eliminative form of reasoning, which iterates over options, accepting or discarding them as per their correctness. Often this does not cater to the context of real-life clinical investigation, unlike the incremental reasoning method as per MCQ-ClinicR.
  • Figure 3: Illustrative Example: Showcasing two different Prompting Strategies with their responses across two dataset variants: (a) MedQA-MCQ and (b) MedQA-Open. context of the answer and corresponding reasoning are highlighted.
  • Figure 4: Prompt to generate CoT reasoning for the Verifier training data.
  • Figure 5: Prompt to generate the reasoning for the Verifier dataset
  • ...and 5 more figures