Exploring Accurate and Transparent Domain Adaptation in Predictive Healthcare via Concept-Grounded Orthogonal Inference
Pengfei Hu, Chang Lu, Feifan Liu, Yue Ning
TL;DR
This work tackles the challenge of deploying predictive models on electronic health records across diverse clinical settings by addressing both performance and transparency under distribution shifts. It introduces ExtraCare, a framework that decomposes patient representations into invariant and covariant components using a sparse autoencoder-driven, dictionary-induced geometry, with explicit $M$-orthogonal residuals $z$ guided by a domain classifier. The method achieves robust predictive accuracy across spatial and temporal domain shifts on two real-world EHR datasets (eICU and OCHIN) while providing concept-grounded explanations through sparse latent dimensions mapped to ICD codes. Practically, ExtraCare offers clinicians interpretable insight into which concepts transfer across domains and which reflect cohort-specific variation, enhancing trust and safety in clinical deployment.
Abstract
Deep learning models for clinical event prediction on electronic health records (EHR) often suffer performance degradation when deployed under different data distributions. While domain adaptation (DA) methods can mitigate such shifts, its "black-box" nature prevents widespread adoption in clinical practice where transparency is essential for trust and safety. We propose ExtraCare to decompose patient representations into invariant and covariant components. By supervising these two components and enforcing their orthogonality during training, our model preserves label information while exposing domain-specific variation at the same time for more accurate predictions than most feature alignment models. More importantly, it offers human-understandable explanations by mapping sparse latent dimensions to medical concepts and quantifying their contributions via targeted ablations. ExtraCare is evaluated on two real-world EHR datasets across multiple domain partition settings, demonstrating superior performance along with enhanced transparency, as evidenced by its accurate predictions and explanations from extensive case studies.
