Table of Contents
Fetching ...

Sampling-guided Heterogeneous Graph Neural Network with Temporal Smoothing for Scalable Longitudinal Data Imputation

Zhaoyang Zhang, Ziqi Chen, Qiao Liu, Jinhan Xie, Hongtu Zhu

TL;DR

A novel framework, the Sampling-guided Heterogeneous Graph Neural Network (HT-GNN), to effectively tackle the challenge of missing data imputation in longitudinal studies, which significantly outperforms existing imputation methods, even with high missing data rates.

Abstract

In this paper, we propose a novel framework, the Sampling-guided Heterogeneous Graph Neural Network (SHT-GNN), to effectively tackle the challenge of missing data imputation in longitudinal studies. Unlike traditional methods, which often require extensive preprocessing to handle irregular or inconsistent missing data, our approach accommodates arbitrary missing data patterns while maintaining computational efficiency. SHT-GNN models both observations and covariates as distinct node types, connecting observation nodes at successive time points through subject-specific longitudinal subnetworks, while covariate-observation interactions are represented by attributed edges within bipartite graphs. By leveraging subject-wise mini-batch sampling and a multi-layer temporal smoothing mechanism, SHT-GNN efficiently scales to large datasets, while effectively learning node representations and imputing missing data. Extensive experiments on both synthetic and real-world datasets, including the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, demonstrate that SHT-GNN significantly outperforms existing imputation methods, even with high missing data rates. The empirical results highlight SHT-GNN's robust imputation capabilities and superior performance, particularly in the context of complex, large-scale longitudinal data.

Sampling-guided Heterogeneous Graph Neural Network with Temporal Smoothing for Scalable Longitudinal Data Imputation

TL;DR

A novel framework, the Sampling-guided Heterogeneous Graph Neural Network (HT-GNN), to effectively tackle the challenge of missing data imputation in longitudinal studies, which significantly outperforms existing imputation methods, even with high missing data rates.

Abstract

In this paper, we propose a novel framework, the Sampling-guided Heterogeneous Graph Neural Network (SHT-GNN), to effectively tackle the challenge of missing data imputation in longitudinal studies. Unlike traditional methods, which often require extensive preprocessing to handle irregular or inconsistent missing data, our approach accommodates arbitrary missing data patterns while maintaining computational efficiency. SHT-GNN models both observations and covariates as distinct node types, connecting observation nodes at successive time points through subject-specific longitudinal subnetworks, while covariate-observation interactions are represented by attributed edges within bipartite graphs. By leveraging subject-wise mini-batch sampling and a multi-layer temporal smoothing mechanism, SHT-GNN efficiently scales to large datasets, while effectively learning node representations and imputing missing data. Extensive experiments on both synthetic and real-world datasets, including the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, demonstrate that SHT-GNN significantly outperforms existing imputation methods, even with high missing data rates. The empirical results highlight SHT-GNN's robust imputation capabilities and superior performance, particularly in the context of complex, large-scale longitudinal data.

Paper Structure

This paper contains 29 sections, 17 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: The ideal format of longitudinal data without missing (left); The regularly and consistently observed longitudinal data with missing in covariate variables and response variable (middle). And the irregular and inconsistent observation schedule in longitudinal data due to missing data (right).
  • Figure 2: The flow chart for Sampling-guided Heterogeneous Graph Neural Network.
  • Figure 3: The variation of information in the representation of observation nodes during multi-layer message passing and representation updates for temporal smoothing within longitudinal subnetworks.
  • Figure 4: The comparison of scalability under different observation sizes across GNN-based methods.