Table of Contents
Fetching ...

Bayes-CATSI: A variational Bayesian deep learning framework for medical time series data imputation

Omkar Kulkarni, Rohitash Chandra

TL;DR

Bayes-CATSI addresses the need for uncertainty-aware imputation in medical time-series data by integrating variational Bayesian layers into the CATSI framework, enabling probabilistic imputation and uncertainty quantification. The approach replaces deterministic deep-learning components with Bayesian counterparts (or partial Bayesian variants) across context-aware recurrent imputation, cross-feature imputation, and the fusion layer, trained via Bayes-by-backprop with an ELBO objective. Empirical results on multi-modal ICU data show Bayes-CATSI achieving a $9.57\%$ improvement in RMSE over CATSI for individual missing values, along with reduced prediction uncertainty; partial Bayes-CATSI offers a trade-off between performance gains and computational cost, performing variably across missingness patterns. The work contributes open-source code and highlights how uncertainty quantification can enhance reliability and decision-making in clinical data imputation, while outlining future work on larger datasets, MCMC sampling, and Transformer-based enhancements.

Abstract

Medical time series datasets feature missing values that need data imputation methods, however, conventional machine learning models fall short due to a lack of uncertainty quantification in predictions. Among these models, the CATSI (Context-Aware Time Series Imputation) stands out for its effectiveness by incorporating a context vector into the imputation process, capturing the global dependencies of each patient. In this paper, we propose a Bayesian Context-Aware Time Series Imputation (Bayes-CATSI) framework which leverages uncertainty quantification offered by variational inference. We consider the time series derived from electroencephalography (EEG), electrooculography (EOG), electromyography (EMG), electrocardiology (EKG). Variational Inference assumes the shape of the posterior distribution and through minimization of the Kullback-Leibler(KL) divergence it finds variational densities that are closest to the true posterior distribution. Thus , we integrate the variational Bayesian deep learning layers into the CATSI model. Our results show that Bayes-CATSI not only provides uncertainty quantification but also achieves superior imputation performance compared to the CATSI model. Specifically, an instance of Bayes-CATSI outperforms CATSI by 9.57 %. We provide an open-source code implementation for applying Bayes-CATSI to other medical data imputation problems.

Bayes-CATSI: A variational Bayesian deep learning framework for medical time series data imputation

TL;DR

Bayes-CATSI addresses the need for uncertainty-aware imputation in medical time-series data by integrating variational Bayesian layers into the CATSI framework, enabling probabilistic imputation and uncertainty quantification. The approach replaces deterministic deep-learning components with Bayesian counterparts (or partial Bayesian variants) across context-aware recurrent imputation, cross-feature imputation, and the fusion layer, trained via Bayes-by-backprop with an ELBO objective. Empirical results on multi-modal ICU data show Bayes-CATSI achieving a improvement in RMSE over CATSI for individual missing values, along with reduced prediction uncertainty; partial Bayes-CATSI offers a trade-off between performance gains and computational cost, performing variably across missingness patterns. The work contributes open-source code and highlights how uncertainty quantification can enhance reliability and decision-making in clinical data imputation, while outlining future work on larger datasets, MCMC sampling, and Transformer-based enhancements.

Abstract

Medical time series datasets feature missing values that need data imputation methods, however, conventional machine learning models fall short due to a lack of uncertainty quantification in predictions. Among these models, the CATSI (Context-Aware Time Series Imputation) stands out for its effectiveness by incorporating a context vector into the imputation process, capturing the global dependencies of each patient. In this paper, we propose a Bayesian Context-Aware Time Series Imputation (Bayes-CATSI) framework which leverages uncertainty quantification offered by variational inference. We consider the time series derived from electroencephalography (EEG), electrooculography (EOG), electromyography (EMG), electrocardiology (EKG). Variational Inference assumes the shape of the posterior distribution and through minimization of the Kullback-Leibler(KL) divergence it finds variational densities that are closest to the true posterior distribution. Thus , we integrate the variational Bayesian deep learning layers into the CATSI model. Our results show that Bayes-CATSI not only provides uncertainty quantification but also achieves superior imputation performance compared to the CATSI model. Specifically, an instance of Bayes-CATSI outperforms CATSI by 9.57 %. We provide an open-source code implementation for applying Bayes-CATSI to other medical data imputation problems.
Paper Structure (23 sections, 25 equations, 6 figures, 7 tables)

This paper contains 23 sections, 25 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: A graphical overview of the imputation process showing the masking procedure from the raw input data through which observation mask and evaluation mask are generated and fed into the imputation model. We show the data loaded into the matrices in the left side, wherein $N$ time steps on the x-axis correspond to the $M$ rows in the matrix, while F1, F2, F3 correspond to the first 3 features/columns in the matrix. We mark the missing values in the input data as crosses in the matrix corresponding to the particular feature and time step and the 'gear box' depicts the imputation model. We present the final output to give a general idea of the input and output of the imputation model. All matrices depicted have dimensions $M$ by $F$, where $F$ represents features of the multivariate time series. The red 'X' indicates the original missing values in the input dataset. The green '+' indicates the missing value deliberately added to evaluate the imputations conducted by the model. The observation mask stores the locations of the values that are present by labelling them as 1 and values that are absent by labelling them as 0. Similarly, the evaluation mask stores locations of missing values that have been deliberately added by labelling them as 1 and labelling other values as 0. These masks help the model differentiate between original missing values, deliberately added missing values and non-missing values.
  • Figure 2: Overview of the internal processing of the imputation model showing the generation of the pre-completed input from the raw input.
  • Figure 3: Detailed architecture of the Bayes-CATSI model
  • Figure 4: Detailed architecture of the Partial Bayes-CATSI model
  • Figure 5: Visualisation of estimations (prediction and confidence interval) by Bayes-CATSI for imputing the missing values in the given patient sample from the dataset. Panel A corresponds to the original data with missing values, while Panel B corresponds to the data featuring the imputations as predicted by Bayes-CATSI.
  • ...and 1 more figures