Table of Contents
Fetching ...

CliCARE: Grounding Large Language Models in Clinical Guidelines for Decision Support over Longitudinal Cancer Electronic Health Records

Dongchen Li, Jitao Liang, Wei Li, Xiaoyu Wang, Longbing Cao, Kun Yu

TL;DR

CliCARE tackles long-range temporal reasoning, hallucination, and evaluation challenges in using LLMs for longitudinal cancer EHRs by transforming unstructured records into Temporal Knowledge Graphs and grounding them to a guideline knowledge graph. The framework combines EHR-to-TKG transformation with trajectory-guideline alignment, employing semantic matching, LLM reranking, and bootstrapped expansion to fuse patient trajectories with normative guidelines. An Expert-Validated LLM-as-a-Judge protocol provides reliable, scalable evaluation that correlates strongly with oncologists (Spearman's ρ ≈ 0.7). Empirical results on private CancerEHR and public MIMIC-Cancer datasets show CliCARE substantially outperforms standard RAG and KG-enhanced baselines, with structured knowledge and long-context processing being key to effective decision support in oncology.

Abstract

Large Language Models (LLMs) hold significant promise for improving clinical decision support and reducing physician burnout by synthesizing complex, longitudinal cancer Electronic Health Records (EHRs). However, their implementation in this critical field faces three primary challenges: the inability to effectively process the extensive length and fragmented nature of patient records for accurate temporal analysis; a heightened risk of clinical hallucination, as conventional grounding techniques such as Retrieval-Augmented Generation (RAG) do not adequately incorporate process-oriented clinical guidelines; and unreliable evaluation metrics that hinder the validation of AI systems in oncology. To address these issues, we propose CliCARE, a framework for Grounding Large Language Models in Clinical Guidelines for Decision Support over Longitudinal Cancer Electronic Health Records. The framework operates by transforming unstructured, longitudinal EHRs into patient-specific Temporal Knowledge Graphs (TKGs) to capture long-range dependencies, and then grounding the decision support process by aligning these real-world patient trajectories with a normative guideline knowledge graph. This approach provides oncologists with evidence-grounded decision support by generating a high-fidelity clinical summary and an actionable recommendation. We validated our framework using large-scale, longitudinal data from a private Chinese cancer dataset and the public English MIMIC-IV dataset. In these settings, CliCARE significantly outperforms baselines, including leading long-context LLMs and Knowledge Graph-enhanced RAG methods. The clinical validity of our results is supported by a robust evaluation protocol, which demonstrates a high correlation with assessments made by oncologists.

CliCARE: Grounding Large Language Models in Clinical Guidelines for Decision Support over Longitudinal Cancer Electronic Health Records

TL;DR

CliCARE tackles long-range temporal reasoning, hallucination, and evaluation challenges in using LLMs for longitudinal cancer EHRs by transforming unstructured records into Temporal Knowledge Graphs and grounding them to a guideline knowledge graph. The framework combines EHR-to-TKG transformation with trajectory-guideline alignment, employing semantic matching, LLM reranking, and bootstrapped expansion to fuse patient trajectories with normative guidelines. An Expert-Validated LLM-as-a-Judge protocol provides reliable, scalable evaluation that correlates strongly with oncologists (Spearman's ρ ≈ 0.7). Empirical results on private CancerEHR and public MIMIC-Cancer datasets show CliCARE substantially outperforms standard RAG and KG-enhanced baselines, with structured knowledge and long-context processing being key to effective decision support in oncology.

Abstract

Large Language Models (LLMs) hold significant promise for improving clinical decision support and reducing physician burnout by synthesizing complex, longitudinal cancer Electronic Health Records (EHRs). However, their implementation in this critical field faces three primary challenges: the inability to effectively process the extensive length and fragmented nature of patient records for accurate temporal analysis; a heightened risk of clinical hallucination, as conventional grounding techniques such as Retrieval-Augmented Generation (RAG) do not adequately incorporate process-oriented clinical guidelines; and unreliable evaluation metrics that hinder the validation of AI systems in oncology. To address these issues, we propose CliCARE, a framework for Grounding Large Language Models in Clinical Guidelines for Decision Support over Longitudinal Cancer Electronic Health Records. The framework operates by transforming unstructured, longitudinal EHRs into patient-specific Temporal Knowledge Graphs (TKGs) to capture long-range dependencies, and then grounding the decision support process by aligning these real-world patient trajectories with a normative guideline knowledge graph. This approach provides oncologists with evidence-grounded decision support by generating a high-fidelity clinical summary and an actionable recommendation. We validated our framework using large-scale, longitudinal data from a private Chinese cancer dataset and the public English MIMIC-IV dataset. In these settings, CliCARE significantly outperforms baselines, including leading long-context LLMs and Knowledge Graph-enhanced RAG methods. The clinical validity of our results is supported by a robust evaluation protocol, which demonstrates a high correlation with assessments made by oncologists.

Paper Structure

This paper contains 40 sections, 7 equations, 8 figures, 12 tables.

Figures (8)

  • Figure 1: The shared challenges for clinicians and LLM in handling complex longitudinal EHRs.
  • Figure 2: A comparison of RAG approaches for long-form longitudinal clinical tasks. (a) Standard RAG suffers from missing key information and hallucinations. (b) KG-enhanced RAG struggles to model temporal dependencies in patient journeys. (c) In contrast, our CliCARE framework transforms EHRs into Temporal Knowledge Graphs, aligns patient trajectories with guidelines, and generates answers using a distilled specialist model, which are then assessed by our evaluation approach.
  • Figure 3: Trajectory-Guideline Alignment workflow. It fuses patient data with guidelines via semantic matching, LLM-based Reranking, and iterative bootstrapping expansion to create a comprehensive, evidence-grounded mapping.
  • Figure 4: Distribution of Hospitalizations in the CancerEHR Dataset(a) and MIMIC-Cancer Dataset(b).
  • Figure 5: Distribution of Text Length in the CancerEHR Dataset(a) and MIMIC-Cancer Dataset(b).
  • ...and 3 more figures