Ruling Out to Rule In: Contrastive Hypothesis Retrieval for Medical Question Answering

Byeolhee Kim; Min-Kyung Kim; Young-Hak Kim; Tae-Joon Jeon

Ruling Out to Rule In: Contrastive Hypothesis Retrieval for Medical Question Answering

Byeolhee Kim, Min-Kyung Kim, Young-Hak Kim, Tae-Joon Jeon

Abstract

Retrieval-augmented generation (RAG) grounds large language models in external medical knowledge, yet standard retrievers frequently surface hard negatives that are semantically close to the query but describe clinically distinct conditions. While existing query-expansion methods improve query representation to mitigate ambiguity, they typically focus on enriching target-relevant semantics without an explicit mechanism to selectively suppress specific, clinically plausible hard negatives. This leaves the system prone to retrieving plausible mimics that overshadow the actual diagnosis, particularly when such mimics are dominant within the corpus. We propose Contrastive Hypothesis Retrieval (CHR), a framework inspired by the process of clinical differential diagnosis. CHR generates a target hypothesis $H^+$ for the likely correct answer and a mimic hypothesis $H^-$ for the most plausible incorrect alternative, then scores documents by promoting $H^+$-aligned evidence while penalizing $H^-$-aligned content. Across three medical QA benchmarks and three answer generators, CHR outperforms all five baselines in every configuration, with improvements of up to 10.4 percentage points over the next-best method. On the $n=587$ pooled cases where CHR answers correctly while embedded hypothetical-document query expansion does not, 85.2\% have no shared documents between the top-5 retrieval lists of CHR and of that baseline, consistent with substantive retrieval redirection rather than light re-ranking of the same candidates. By explicitly modeling what to avoid alongside what to find, CHR bridges clinical reasoning with retrieval mechanism design and offers a practical path to reducing hard-negative contamination in medical RAG systems.

Ruling Out to Rule In: Contrastive Hypothesis Retrieval for Medical Question Answering

Abstract

for the likely correct answer and a mimic hypothesis

for the most plausible incorrect alternative, then scores documents by promoting

-aligned evidence while penalizing

-aligned content. Across three medical QA benchmarks and three answer generators, CHR outperforms all five baselines in every configuration, with improvements of up to 10.4 percentage points over the next-best method. On the

pooled cases where CHR answers correctly while embedded hypothetical-document query expansion does not, 85.2\% have no shared documents between the top-5 retrieval lists of CHR and of that baseline, consistent with substantive retrieval redirection rather than light re-ranking of the same candidates. By explicitly modeling what to avoid alongside what to find, CHR bridges clinical reasoning with retrieval mechanism design and offers a practical path to reducing hard-negative contamination in medical RAG systems.

Ruling Out to Rule In: Contrastive Hypothesis Retrieval for Medical Question Answering

Abstract

Ruling Out to Rule In: Contrastive Hypothesis Retrieval for Medical Question Answering

Abstract

Paper Structure

Table of Contents

Figures (5)