Table of Contents
Fetching ...

Retrieval-aligned Tabular Foundation Models Enable Robust Clinical Risk Prediction in Electronic Health Records Under Real-world Constraints

Minh-Khoi Pham, Thang-Long Nguyen Ho, Thao Thi Phuong Dao, Tai Tan Mai, Minh-Triet Tran, Marie E. Ward, Una Geary, Rob Brennan, Nick McDonald, Martin Crane, Marija Bezbradica

Abstract

Clinical prediction from structured electronic health records (EHRs) is challenging due to high dimensionality, heterogeneity, class imbalance, and distribution shift. While tabular in-context learning (TICL) and retrieval-augmented methods perform well on generic benchmarks, their behavior in clinical settings remains unclear. We present a multi-cohort EHR benchmark comparing classical, deep tabular, and TICL models across varying data scale, feature dimensionality, outcome rarity, and cross-cohort generalization. PFN-based TICL models are sample-efficient in low-data regimes but degrade under naive distance-based retrieval as heterogeneity and imbalance increase. We propose AWARE, a task-aligned retrieval framework using supervised embedding learning and lightweight adapters. AWARE improves AUPRC by up to 12.2% under extreme imbalance, with gains increasing with data complexity. Our results identify retrieval quality and retrieval-inference alignment as key bottlenecks for deploying tabular in-context learning in clinical prediction.

Retrieval-aligned Tabular Foundation Models Enable Robust Clinical Risk Prediction in Electronic Health Records Under Real-world Constraints

Abstract

Clinical prediction from structured electronic health records (EHRs) is challenging due to high dimensionality, heterogeneity, class imbalance, and distribution shift. While tabular in-context learning (TICL) and retrieval-augmented methods perform well on generic benchmarks, their behavior in clinical settings remains unclear. We present a multi-cohort EHR benchmark comparing classical, deep tabular, and TICL models across varying data scale, feature dimensionality, outcome rarity, and cross-cohort generalization. PFN-based TICL models are sample-efficient in low-data regimes but degrade under naive distance-based retrieval as heterogeneity and imbalance increase. We propose AWARE, a task-aligned retrieval framework using supervised embedding learning and lightweight adapters. AWARE improves AUPRC by up to 12.2% under extreme imbalance, with gains increasing with data complexity. Our results identify retrieval quality and retrieval-inference alignment as key bottlenecks for deploying tabular in-context learning in clinical prediction.

Paper Structure

This paper contains 54 sections, 17 equations, 10 figures, 7 tables.

Figures (10)

  • Figure 1: Conceptual comparison of major learning paradigms for EHR foundation models, illustrating differences in representation, adaptation strategy, and computational trade-offs across native tabular models, sequential event-based models, language-based abstractions, and tabular in-context learning. ✓ highlights advantages while ✗ highlights disavantages.
  • Figure 2: Landscape of EHR benchmarks evaluated in this study. The datasets span wide variation in cohort size, feature dimensionality, outcome prevalence, and clinical domain, illustrating key challenges of (a) heterogeneity, (b) rarity, (c) cross-task, cross-institutional generalizability and (d) distribution shift faced by tabular learning methods in real-world EHR settings. See detailed descriptions of reported datasets in Table \ref{['tab:apdx:dataset_stats']}.
  • Figure 3: The challenges of raw EHR data for retrieval and the solution of task-aware alignment. (A) Raw vector is heterogeneous, sparse and task-agnostic, leading to noisy geometry and multi-task misalignment. (B) Task-aligned projection reshapes geometry for label-consistent, effective retrieval.
  • Figure 4: Overview of our designed task-aligned retrieval for retrieval-augmented in-context learning EHR model.
  • Figure 5: Visualization of patient neighborhood structure across MIMIC-IV and eICU dataset, illustrated by projecting data into 2D space. (a) In raw feature space, proximity does not reliably correspond to predictive relevance across different clinical tasks, resulting in scattered positive examples. (b) In contrast, task-aligned embedding retrieval (AWARE) reshapes the representation space such that label-consistent examples form coherent local clusters while preserving smooth structure for downstream in-context learning. (c) Black bars illustrate task-specific feature reweighting induced by the attention-based encoder, highlighting how retrieval adapts to heterogeneous clinical variables.
  • ...and 5 more figures