Table of Contents
Fetching ...

Iterative Tree Analysis for Medical Critics

Zenan Huang, Mingwei Li, Zheng Zhou, Youxin Jiang

TL;DR

This work tackles the challenge of factuality verification for long-form medical text generated by large language models, where hallucinations arise from implicit, interconnected claims. It introduces Iterative Tree Analysis (ITA), a tree-based framework that (i) extracts self-contained atomic claims, (ii) expands verification sub-trees via adaptive retrieval, and (iii) consolidates evidence bottom-up to judge claim validity. A new Med-Critics benchmark is created to enable fine-grained evaluation across six medical categories, and ITA demonstrates superior factual verification performance (notably outperforming baselines by a sizable margin) while providing robust alignment with human judgments. The approach promises safer, more reliable medical QA by enabling detailed mechanism-level reasoning and transparent evidence tracing, with potential for broader adoption through public data release and future refinements in claim extraction and retrieval integration.

Abstract

Large Language Models (LLMs) have been widely adopted across various domains, yet their application in the medical field poses unique challenges, particularly concerning the generation of hallucinations. Hallucinations in open-ended long medical text manifest as misleading critical claims, which are difficult to verify due to two reasons. First, critical claims are often deeply entangled within the text and cannot be extracted based solely on surface-level presentation. Second, verifying these claims is challenging because surface-level token-based retrieval often lacks precise or specific evidence, leaving the claims unverifiable without deeper mechanism-based analysis. In this paper, we introduce a novel method termed Iterative Tree Analysis (ITA) for medical critics. ITA is designed to extract implicit claims from long medical texts and verify each claim through an iterative and adaptive tree-like reasoning process. This process involves a combination of top-down task decomposition and bottom-up evidence consolidation, enabling precise verification of complex medical claims through detailed mechanism-level reasoning. Our extensive experiments demonstrate that ITA significantly outperforms previous methods in detecting factual inaccuracies in complex medical text verification tasks by 10%. Additionally, we will release a comprehensive test set to the public, aiming to foster further advancements in research within this domain.

Iterative Tree Analysis for Medical Critics

TL;DR

This work tackles the challenge of factuality verification for long-form medical text generated by large language models, where hallucinations arise from implicit, interconnected claims. It introduces Iterative Tree Analysis (ITA), a tree-based framework that (i) extracts self-contained atomic claims, (ii) expands verification sub-trees via adaptive retrieval, and (iii) consolidates evidence bottom-up to judge claim validity. A new Med-Critics benchmark is created to enable fine-grained evaluation across six medical categories, and ITA demonstrates superior factual verification performance (notably outperforming baselines by a sizable margin) while providing robust alignment with human judgments. The approach promises safer, more reliable medical QA by enabling detailed mechanism-level reasoning and transparent evidence tracing, with potential for broader adoption through public data release and future refinements in claim extraction and retrieval integration.

Abstract

Large Language Models (LLMs) have been widely adopted across various domains, yet their application in the medical field poses unique challenges, particularly concerning the generation of hallucinations. Hallucinations in open-ended long medical text manifest as misleading critical claims, which are difficult to verify due to two reasons. First, critical claims are often deeply entangled within the text and cannot be extracted based solely on surface-level presentation. Second, verifying these claims is challenging because surface-level token-based retrieval often lacks precise or specific evidence, leaving the claims unverifiable without deeper mechanism-based analysis. In this paper, we introduce a novel method termed Iterative Tree Analysis (ITA) for medical critics. ITA is designed to extract implicit claims from long medical texts and verify each claim through an iterative and adaptive tree-like reasoning process. This process involves a combination of top-down task decomposition and bottom-up evidence consolidation, enabling precise verification of complex medical claims through detailed mechanism-level reasoning. Our extensive experiments demonstrate that ITA significantly outperforms previous methods in detecting factual inaccuracies in complex medical text verification tasks by 10%. Additionally, we will release a comprehensive test set to the public, aiming to foster further advancements in research within this domain.
Paper Structure (25 sections, 6 equations, 4 figures, 4 tables)

This paper contains 25 sections, 6 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Overview of the ITA framework. During the tree-spanning stage, individual claims are extracted from the medical text, ensuring each claim is self-contained by incorporating information from other parts of the text. Verification sub-tasks are distributed recursively down to the leaf nodes. In the consolidating stage, deeper knowledge insights for each individual claim are used to determine whether to accept or reject its parent claim. The framework outputs the verification status of each individual claim along with its supporting references.
  • Figure 2: An example of verification
  • Figure 3: Human evaluation
  • Figure 4: Model performance on long-form medical text