Table of Contents
Fetching ...

Causal Tree Extraction from Medical Case Reports: A Novel Task for Experts-like Text Comprehension

Sakiko Yahata, Zhen Wan, Fei Cheng, Sadao Kurohashi, Hisahiko Sato, Ryozo Nagai

TL;DR

This work introduces causal tree extraction (CTE) to model the diagnostic process in medical case reports as a hierarchical tree rooted at the primary disease. It builds J-Casemap, a Japanese internal medicine dataset annotated by clinicians, and proposes a generation-based CTE method that combines continual pretraining on Japanese medical data with supervised fine-tuning, achieving a human-evaluated score of 82.7 and outperforming a baseline by 20.2 points. To align automatic metrics with clinician preferences, a weighted triplet evaluation is proposed, improving correlation with manual judgments. The results show CTE enhances medical QA performance and offers a resource for expert-like understanding of clinical cases, while acknowledging hallucination risks and data-sharing limitations that guide future improvements.

Abstract

Extracting causal relationships from a medical case report is essential for comprehending the case, particularly its diagnostic process. Since the diagnostic process is regarded as a bottom-up inference, causal relationships in cases naturally form a multi-layered tree structure. The existing tasks, such as medical relation extraction, are insufficient for capturing the causal relationships of an entire case, as they treat all relations equally without considering the hierarchical structure inherent in the diagnostic process. Thus, we propose a novel task, Causal Tree Extraction (CTE), which receives a case report and generates a causal tree with the primary disease as the root, providing an intuitive understanding of a case's diagnostic process. Subsequently, we construct a Japanese case report CTE dataset, J-Casemap, propose a generation-based CTE method that outperforms the baseline by 20.2 points in the human evaluation, and introduce evaluation metrics that reflect clinician preferences. Further experiments also show that J-Casemap enhances the performance of solving other medical tasks, such as question answering.

Causal Tree Extraction from Medical Case Reports: A Novel Task for Experts-like Text Comprehension

TL;DR

This work introduces causal tree extraction (CTE) to model the diagnostic process in medical case reports as a hierarchical tree rooted at the primary disease. It builds J-Casemap, a Japanese internal medicine dataset annotated by clinicians, and proposes a generation-based CTE method that combines continual pretraining on Japanese medical data with supervised fine-tuning, achieving a human-evaluated score of 82.7 and outperforming a baseline by 20.2 points. To align automatic metrics with clinician preferences, a weighted triplet evaluation is proposed, improving correlation with manual judgments. The results show CTE enhances medical QA performance and offers a resource for expert-like understanding of clinical cases, while acknowledging hallucination risks and data-sharing limitations that guide future improvements.

Abstract

Extracting causal relationships from a medical case report is essential for comprehending the case, particularly its diagnostic process. Since the diagnostic process is regarded as a bottom-up inference, causal relationships in cases naturally form a multi-layered tree structure. The existing tasks, such as medical relation extraction, are insufficient for capturing the causal relationships of an entire case, as they treat all relations equally without considering the hierarchical structure inherent in the diagnostic process. Thus, we propose a novel task, Causal Tree Extraction (CTE), which receives a case report and generates a causal tree with the primary disease as the root, providing an intuitive understanding of a case's diagnostic process. Subsequently, we construct a Japanese case report CTE dataset, J-Casemap, propose a generation-based CTE method that outperforms the baseline by 20.2 points in the human evaluation, and introduce evaluation metrics that reflect clinician preferences. Further experiments also show that J-Casemap enhances the performance of solving other medical tasks, such as question answering.

Paper Structure

This paper contains 28 sections, 3 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: A diagnostic bottom-up procedure.
  • Figure 2: A tree summary is decomposed into triplets.
  • Figure 3: The SFT data example.
  • Figure 4: Manual evaluation of the generation and RE models on the same 300 cases. The generation model achieved an average score of $82.7$, and the RE model achieved an average score of $62.5$.
  • Figure 5: Triplet-based F1 scores of fine-tuned models in settings with varying amounts of SFT data (100%, 75%, 50%, and 25%).
  • ...and 4 more figures