VAE-IF: Deep feature extraction with averaging for fully unsupervised artifact detection in routinely acquired ICU time-series
Hollan Haule, Ian Piper, Patricia Jones, Chen Qin, Tsz-Yan Milly Lo, Javier Escudero
TL;DR
Artifact contamination in ICU time series undermines research and clinical decisions. The authors present VAE-IF, a fully unsupervised framework that combines a $\beta$-VAE encoder with an Attention mechanism and an Isolation Forest to detect artifacts at minute resolution. Evaluations on the KidsBrainIT dataset show that VAE-IF achieves sensitivity comparable to supervised baselines while maintaining high specificity and without labeled artifacts, with external validation on MIMIC-IV confirming generalizability. Latent-space analyses with t-SNE indicate effective disentanglement of clean versus noisy samples. Overall, the approach offers a practical, label-free solution for cleaning ICU data for clinical research and practice.
Abstract
Artifacts are a common problem in physiological time series collected from intensive care units (ICU) and other settings. They affect the quality and reliability of clinical research and patient care. Manual annotation of artifacts is costly and time-consuming, rendering it impractical. Automated methods are desired. Here, we propose a novel fully unsupervised approach to detect artifacts in clinical-standard, minute-by-minute resolution ICU data without any prior labeling or signal-specific knowledge. Our approach combines a variational autoencoder (VAE) and an isolation forest (IF) into a hybrid model to learn features and identify anomalies in different types of vital signs, such as blood pressure, heart rate, and intracranial pressure. We evaluate our approach on a real-world ICU dataset and compare it with supervised benchmark models based on long short-term memory (LSTM) and XGBoost and statistical methods such as ARIMA. We show that our unsupervised approach achieves comparable sensitivity to fully supervised methods and generalizes well to an external dataset. We also visualize the latent space learned by the VAE and demonstrate its ability to disentangle clean and noisy samples. Our approach offers a promising solution for cleaning ICU data in clinical research and practice without the need for any labels whatsoever.
