Table of Contents
Fetching ...

Missing data imputation for noisy time-series data and applications in healthcare

Lien P. Le, Xuan-Hien Nguyen Thi, Thu Nguyen, Michael A. Riegler, Pål Halvorsen, Binh T. Nguyen

TL;DR

This study addresses missing data in healthcare time series by comparing MICE-RF with state-of-the-art deep-learning imputers (SAITS, BRITS, Transformer) across missing-data rates from $10\%$ to $80\%$. It evaluates not only imputation accuracy via MAE but also downstream classification performance (F1-score, AUC, MCC) to capture denoising effects. The results show MICE-RF often yields the best MAE for univariate data at moderate missingness, while multivariate data without periodicity may benefit more from deep-learning approaches; importantly, imputation generally enhances downstream classification, illustrating denoising alongside filling gaps. The findings provide guidance on method selection based on data characteristics and highlight the practical impact of imputation on real-world healthcare analytics.

Abstract

Healthcare time series data is vital for monitoring patient activity but often contains noise and missing values due to various reasons such as sensor errors or data interruptions. Imputation, i.e., filling in the missing values, is a common way to deal with this issue. In this study, we compare imputation methods, including Multiple Imputation with Random Forest (MICE-RF) and advanced deep learning approaches (SAITS, BRITS, Transformer) for noisy, missing time series data in terms of MAE, F1-score, AUC, and MCC, across missing data rates (10 % - 80 %). Our results show that MICE-RF can effectively impute missing data compared to deep learning methods and the improvement in classification of data imputed indicates that imputation can have denoising effects. Therefore, using an imputation algorithm on time series with missing data can, at the same time, offer denoising effects.

Missing data imputation for noisy time-series data and applications in healthcare

TL;DR

This study addresses missing data in healthcare time series by comparing MICE-RF with state-of-the-art deep-learning imputers (SAITS, BRITS, Transformer) across missing-data rates from to . It evaluates not only imputation accuracy via MAE but also downstream classification performance (F1-score, AUC, MCC) to capture denoising effects. The results show MICE-RF often yields the best MAE for univariate data at moderate missingness, while multivariate data without periodicity may benefit more from deep-learning approaches; importantly, imputation generally enhances downstream classification, illustrating denoising alongside filling gaps. The findings provide guidance on method selection based on data characteristics and highlight the practical impact of imputation on real-world healthcare analytics.

Abstract

Healthcare time series data is vital for monitoring patient activity but often contains noise and missing values due to various reasons such as sensor errors or data interruptions. Imputation, i.e., filling in the missing values, is a common way to deal with this issue. In this study, we compare imputation methods, including Multiple Imputation with Random Forest (MICE-RF) and advanced deep learning approaches (SAITS, BRITS, Transformer) for noisy, missing time series data in terms of MAE, F1-score, AUC, and MCC, across missing data rates (10 % - 80 %). Our results show that MICE-RF can effectively impute missing data compared to deep learning methods and the improvement in classification of data imputed indicates that imputation can have denoising effects. Therefore, using an imputation algorithm on time series with missing data can, at the same time, offer denoising effects.

Paper Structure

This paper contains 12 sections, 2 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: The performance of methods on Psykose Dataset for different layers
  • Figure 2: The performance of methods on Depresjon Dataset for different layers
  • Figure 3: The performance of methods on HTAD Dataset for different layers