Exploring Accurate and Transparent Domain Adaptation in Predictive Healthcare via Concept-Grounded Orthogonal Inference

Pengfei Hu; Chang Lu; Feifan Liu; Yue Ning

Exploring Accurate and Transparent Domain Adaptation in Predictive Healthcare via Concept-Grounded Orthogonal Inference

Pengfei Hu, Chang Lu, Feifan Liu, Yue Ning

TL;DR

This work tackles the challenge of deploying predictive models on electronic health records across diverse clinical settings by addressing both performance and transparency under distribution shifts. It introduces ExtraCare, a framework that decomposes patient representations into invariant and covariant components using a sparse autoencoder-driven, dictionary-induced geometry, with explicit $M$-orthogonal residuals $z$ guided by a domain classifier. The method achieves robust predictive accuracy across spatial and temporal domain shifts on two real-world EHR datasets (eICU and OCHIN) while providing concept-grounded explanations through sparse latent dimensions mapped to ICD codes. Practically, ExtraCare offers clinicians interpretable insight into which concepts transfer across domains and which reflect cohort-specific variation, enhancing trust and safety in clinical deployment.

Abstract

Deep learning models for clinical event prediction on electronic health records (EHR) often suffer performance degradation when deployed under different data distributions. While domain adaptation (DA) methods can mitigate such shifts, its "black-box" nature prevents widespread adoption in clinical practice where transparency is essential for trust and safety. We propose ExtraCare to decompose patient representations into invariant and covariant components. By supervising these two components and enforcing their orthogonality during training, our model preserves label information while exposing domain-specific variation at the same time for more accurate predictions than most feature alignment models. More importantly, it offers human-understandable explanations by mapping sparse latent dimensions to medical concepts and quantifying their contributions via targeted ablations. ExtraCare is evaluated on two real-world EHR datasets across multiple domain partition settings, demonstrating superior performance along with enhanced transparency, as evidenced by its accurate predictions and explanations from extensive case studies.

Exploring Accurate and Transparent Domain Adaptation in Predictive Healthcare via Concept-Grounded Orthogonal Inference

TL;DR

-orthogonal residuals

guided by a domain classifier. The method achieves robust predictive accuracy across spatial and temporal domain shifts on two real-world EHR datasets (eICU and OCHIN) while providing concept-grounded explanations through sparse latent dimensions mapped to ICD codes. Practically, ExtraCare offers clinicians interpretable insight into which concepts transfer across domains and which reflect cohort-specific variation, enhancing trust and safety in clinical deployment.

Abstract

Paper Structure (60 sections, 54 equations, 3 figures, 7 tables, 1 algorithm)

This paper contains 60 sections, 54 equations, 3 figures, 7 tables, 1 algorithm.

Introduction
Related Work
Methodology
Problem Definition
Feature Extraction & Alignment
Reconstruction of Aligned Features
Orthogonal Covariates Inference
Training, Inference, and Interpretation
Experiments
Experimental Setup
Predictive Robustness in Adaptation (RQ1)
Model Interpretation (RQ2)
Ablation Study
Effectiveness of Factorized Subspaces
Adaptation From General to Specific Facilities
...and 45 more sections

Figures (3)

Figure 1: Overview of ExtraCare architecture for clinical domain adaptation problems. Inputs $x, x'$ are encoded by $f_\phi(\cdot)$ into $v, v'$, supervised for label prediction with $p\zeta(\cdot)$. A sparse autoencoder $h_\theta(\cdot)$ induces a dictionary metric $M = W_\theta^\top W_\theta$ and enables orthogonal inference that factorizes representations into invariant features and domain-specific residuals $z$, supervised by a domain classifier $d_\omega(\cdot)$.
Figure 2: Clinical Concept Attribution via Sparse-Dimension Ablation. (a) We extract sparse concept activations for two patients and select the top-3 active (with highest activations) dimensions. (b) We ablate each selected dimension and visualize the resulting diagnosis absolute probability change $\Delta \text{prob}$, with a threshold of $0.05$ (dashed line). (c) We categorize mapped ICD10-CM codes by label impact and domain sensitivity to distinguish transferable evidence from shift-sensitive variation.
Figure 3: Facility-specific ICD-10 code distribution shift and cross-domain diagnosis retrieval performance. The top row visualizes the Top 5 frequent ICD10 codes from the source (Primary Care/Family Practice) subsets, showing patient proportions (bars) and rank frequency (line) across facility-specific cohorts, with hatched bars indicating the source-subset frequencies for comparison. The bottom row reports R@5 across different models when adapting models on specific type, where blue and red lines denote Base and Oracle.

Exploring Accurate and Transparent Domain Adaptation in Predictive Healthcare via Concept-Grounded Orthogonal Inference

TL;DR

Abstract

Exploring Accurate and Transparent Domain Adaptation in Predictive Healthcare via Concept-Grounded Orthogonal Inference

Authors

TL;DR

Abstract

Table of Contents

Figures (3)