Table of Contents
Fetching ...

WellFactor: Patient Profiling using Integrative Embedding of Healthcare Data

Dongjin Choi, Andy Xiang, Ozgur Ozturk, Deep Shrestha, Barry Drake, Hamid Haidarian, Faizan Javed, Haesun Park

TL;DR

WellFactor proposes a unified patient-profiling framework that integrates heterogeneous healthcare data through a constrained, nonnegative, multi-view low-rank factorization. By learning a shared embedding $H$ from three data domains (search, browsing, and diagnoses) with domain-specific bases $(W_s, W_b, W_d)$ and optional semi-supervised labels, the method effectively handles sparsity and missing data via a masking mechanism and an open-world vs closed-world assumption for different views. The approach supports embedding for unseen patients and unseen data items without reprocessing the full dataset, enabling fast personalization, clustering, and similarity queries. Empirical evaluation on Kaiser Permanente web-portal data shows superior classification, clustering interpretability, and disease-prediction/similarity capabilities compared with several baselines, with an open-source implementation to foster reproducibility and extension in healthcare analytics.

Abstract

In the rapidly evolving healthcare industry, platforms now have access to not only traditional medical records, but also diverse data sets encompassing various patient interactions, such as those from healthcare web portals. To address this rich diversity of data, we introduce WellFactor: a method that derives patient profiles by integrating information from these sources. Central to our approach is the utilization of constrained low-rank approximation. WellFactor is optimized to handle the sparsity that is often inherent in healthcare data. Moreover, by incorporating task-specific label information, our method refines the embedding results, offering a more informed perspective on patients. One important feature of WellFactor is its ability to compute embeddings for new, previously unobserved patient data instantaneously, eliminating the need to revisit the entire data set or recomputing the embedding. Comprehensive evaluations on real-world healthcare data demonstrate WellFactor's effectiveness. It produces better results compared to other existing methods in classification performance, yields meaningful clustering of patients, and delivers consistent results in patient similarity searches and predictions.

WellFactor: Patient Profiling using Integrative Embedding of Healthcare Data

TL;DR

WellFactor proposes a unified patient-profiling framework that integrates heterogeneous healthcare data through a constrained, nonnegative, multi-view low-rank factorization. By learning a shared embedding from three data domains (search, browsing, and diagnoses) with domain-specific bases and optional semi-supervised labels, the method effectively handles sparsity and missing data via a masking mechanism and an open-world vs closed-world assumption for different views. The approach supports embedding for unseen patients and unseen data items without reprocessing the full dataset, enabling fast personalization, clustering, and similarity queries. Empirical evaluation on Kaiser Permanente web-portal data shows superior classification, clustering interpretability, and disease-prediction/similarity capabilities compared with several baselines, with an open-source implementation to foster reproducibility and extension in healthcare analytics.

Abstract

In the rapidly evolving healthcare industry, platforms now have access to not only traditional medical records, but also diverse data sets encompassing various patient interactions, such as those from healthcare web portals. To address this rich diversity of data, we introduce WellFactor: a method that derives patient profiles by integrating information from these sources. Central to our approach is the utilization of constrained low-rank approximation. WellFactor is optimized to handle the sparsity that is often inherent in healthcare data. Moreover, by incorporating task-specific label information, our method refines the embedding results, offering a more informed perspective on patients. One important feature of WellFactor is its ability to compute embeddings for new, previously unobserved patient data instantaneously, eliminating the need to revisit the entire data set or recomputing the embedding. Comprehensive evaluations on real-world healthcare data demonstrate WellFactor's effectiveness. It produces better results compared to other existing methods in classification performance, yields meaningful clustering of patients, and delivers consistent results in patient similarity searches and predictions.
Paper Structure (31 sections, 12 equations, 2 figures, 6 tables)

This paper contains 31 sections, 12 equations, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Illustration of the diverse patient data sources collected from interactions on the web portal and with medical professionals.
  • Figure 2: Graphical overview of the proposed WellFactor patient profiling framework.