Table of Contents
Fetching ...

Modeling Clinical Uncertainty in Radiology Reports: from Explicit Uncertainty Markers to Implicit Reasoning Pathways

Paloma Rabaey, Jong Hak Moon, Jung-Oh Lee, Min Gwan Kim, Hangyul Yoon, Thomas Demeester, Edward Choi

TL;DR

This work addresses the challenge of uncertainty in radiology reports by separating explicit hedging-based uncertainty from implicit diagnostic reasoning. It introduces a two-part framework: (i) explicit uncertainty is quantified by constructing an expert-validated, LLM-driven reference ranking of hedging phrases and mapping sentence-level hedging to a continuous probability for each finding; (ii) implicit uncertainty is modeled through a Pathway Expansion Framework that expands reports along expert-defined diagnostic pathways for 14 chest X-ray diagnoses, reconstructing omitted sub-findings. The authors release Lunguage++, an uncertainty-aware extension of the Lunguage dataset, which enables uncertainty-aware image classification, faithful diagnostic reasoning, and investigations into how diagnostic uncertainty impacts clinical outcomes. Methodologically, the paper combines hedging phrase extraction, TrueSkill-based ranking, LLM judgments, expert validation, and a rule-based expansion grounded in DAG-structured diagnostic pathways, producing a richer, more interpretable resource for uncertainty-aware radiology AI and evaluation. The work offers practical implications for training and evaluating uncertainty-aware models and provides reusable resources and code for community adoption via Github and Physionet.

Abstract

Radiology reports are invaluable for clinical decision-making and hold great potential for automated analysis when structured into machine-readable formats. These reports often contain uncertainty, which we categorize into two distinct types: (i) Explicit uncertainty reflects doubt about the presence or absence of findings, conveyed through hedging phrases. These vary in meaning depending on the context, making rule-based systems insufficient to quantify the level of uncertainty for specific findings; (ii) Implicit uncertainty arises when radiologists omit parts of their reasoning, recording only key findings or diagnoses. Here, it is often unclear whether omitted findings are truly absent or simply unmentioned for brevity. We address these challenges with a two-part framework. We quantify explicit uncertainty by creating an expert-validated, LLM-based reference ranking of common hedging phrases, and mapping each finding to a probability value based on this reference. In addition, we model implicit uncertainty through an expansion framework that systematically adds characteristic sub-findings derived from expert-defined diagnostic pathways for 14 common diagnoses. Using these methods, we release Lunguage++, an expanded, uncertainty-aware version of the Lunguage benchmark of fine-grained structured radiology reports. This enriched resource enables uncertainty-aware image classification, faithful diagnostic reasoning, and new investigations into the clinical impact of diagnostic uncertainty.

Modeling Clinical Uncertainty in Radiology Reports: from Explicit Uncertainty Markers to Implicit Reasoning Pathways

TL;DR

This work addresses the challenge of uncertainty in radiology reports by separating explicit hedging-based uncertainty from implicit diagnostic reasoning. It introduces a two-part framework: (i) explicit uncertainty is quantified by constructing an expert-validated, LLM-driven reference ranking of hedging phrases and mapping sentence-level hedging to a continuous probability for each finding; (ii) implicit uncertainty is modeled through a Pathway Expansion Framework that expands reports along expert-defined diagnostic pathways for 14 chest X-ray diagnoses, reconstructing omitted sub-findings. The authors release Lunguage++, an uncertainty-aware extension of the Lunguage dataset, which enables uncertainty-aware image classification, faithful diagnostic reasoning, and investigations into how diagnostic uncertainty impacts clinical outcomes. Methodologically, the paper combines hedging phrase extraction, TrueSkill-based ranking, LLM judgments, expert validation, and a rule-based expansion grounded in DAG-structured diagnostic pathways, producing a richer, more interpretable resource for uncertainty-aware radiology AI and evaluation. The work offers practical implications for training and evaluating uncertainty-aware models and provides reusable resources and code for community adoption via Github and Physionet.

Abstract

Radiology reports are invaluable for clinical decision-making and hold great potential for automated analysis when structured into machine-readable formats. These reports often contain uncertainty, which we categorize into two distinct types: (i) Explicit uncertainty reflects doubt about the presence or absence of findings, conveyed through hedging phrases. These vary in meaning depending on the context, making rule-based systems insufficient to quantify the level of uncertainty for specific findings; (ii) Implicit uncertainty arises when radiologists omit parts of their reasoning, recording only key findings or diagnoses. Here, it is often unclear whether omitted findings are truly absent or simply unmentioned for brevity. We address these challenges with a two-part framework. We quantify explicit uncertainty by creating an expert-validated, LLM-based reference ranking of common hedging phrases, and mapping each finding to a probability value based on this reference. In addition, we model implicit uncertainty through an expansion framework that systematically adds characteristic sub-findings derived from expert-defined diagnostic pathways for 14 common diagnoses. Using these methods, we release Lunguage++, an expanded, uncertainty-aware version of the Lunguage benchmark of fine-grained structured radiology reports. This enriched resource enables uncertainty-aware image classification, faithful diagnostic reasoning, and new investigations into the clinical impact of diagnostic uncertainty.

Paper Structure

This paper contains 41 sections, 1 equation, 13 figures, 8 tables, 4 algorithms.

Figures (13)

  • Figure 1: Two types of uncertainty in radiology reports that we address during structuring, expanding the Lunguage dataset of structured CXR reports to form Lunguage++. Explicit uncertainty is conveyed by hedging phrases that indicate tentative findings, whose (un)certainty we quantify with probabilities. Implicit uncertainty stems from findings that are not explicitly mentioned; we mitigate this by applying expert-defined diagnostic pathways to expand stated diagnoses with their characteristic sub-findings.
  • Figure 2: Strategy for assigning probabilities to finding-sentence pairs with tentative certainty in the Lunguage dataset: We first build a vocabulary of common hedging phrases and the sentences in which these are used (Section \ref{['sec:explicit_step1']}). Next, we leverage LLMs to construct a reference ranking of these phrases, by performing pairwise comparisons of examples sentences (Section \ref{['sec:explicit_step2']}). Each finding-sentence pair is then compared against this reference (Section \ref{['sec:explicit_step3']}) and is finally mapped to a probability (Section \ref{['sec:explicit_step4']}). This approach ensures that the probability assigned to each finding reflects not only the hedging phrase itself but also the broader context in which it appears.
  • Figure 3: Reference ranking of the 42 common hedging phrases in our vocabulary. The mean skill level $\mu$ for each phrase is shown on the right, with the confidence $\sigma$ represented by the error bars. Phrases at the top of the ranking correspond to a high likelihood that the finding is present, while phrases at the bottom correspond to a high likelihood that the finding is absent.
  • Figure 4: Overview of the Pathway Expansion Framework. The framework expands structured findings from Lunguage along diagnostic pathways (Section \ref{['sec:pathways']}) to reconstruct omitted diagnostic evidence. It comprises four stages—finding deduplication, pathway matching, pathway expansion, and conflict resolution (Section \ref{['sec:expansion_system']})—that jointly ensure semantic coherence and clinical validity. The resulting representation connects high-level diagnoses with their underlying evidence, forming Lunguage++, which is further analyzed in Section \ref{['sec:analyis_of_expanded_dset']}.
  • Figure A1: Our vocabulary of 42 common hedging phrases, including how many times each phrase was extracted for tentative finding-sentence pairs in Lunguage.
  • ...and 8 more figures