How Deep is your Guess? A Fresh Perspective on Deep Learning for Medical Time-Series Imputation
Linglong Qian, Tao Wang, Jun Wang, Hugh Logan Ellis, Robin Mitra, Richard Dobson, Zina Ibrahim
TL;DR
This paper analyzes deep imputers for EHR time-series, showing that the interplay of architectural and generative biases strongly shapes imputation performance and that larger models do not guarantee better results. It provides a theoretically grounded taxonomy of imputers, and an open, controlled benchmarking study using PyPOTS across eight models on PhysioNet 2012, revealing that design choices and alignment with EHR characteristics drive performance as much as, or more than, model size. The work highlights critical gaps in evaluation practices, especially masking strategies and uncertainty quantification, and identifies open questions at the interface of clinical domain knowledge and deep learning. The findings advocate for standardized, data-driven benchmarking and for integrating clinical insights to develop more reliable and clinically meaningful imputation methods for healthcare applications.
Abstract
We present a comprehensive analysis of deep learning approaches for Electronic Health Record (EHR) time-series imputation, examining how architectural and framework biases combine to influence model performance. Our investigation reveals varying capabilities of deep imputers in capturing complex spatiotemporal dependencies within EHRs, and that model effectiveness depends on how its combined biases align with medical time-series characteristics. Our experimental evaluation challenges common assumptions about model complexity, demonstrating that larger models do not necessarily improve performance. Rather, carefully designed architectures can better capture the complex patterns inherent in clinical data. The study highlights the need for imputation approaches that prioritise clinically meaningful data reconstruction over statistical accuracy. Our experiments show imputation performance variations of up to 20\% based on preprocessing and implementation choices, emphasising the need for standardised benchmarking methodologies. Finally, we identify critical gaps between current deep imputation methods and medical requirements, highlighting the importance of integrating clinical insights to achieve more reliable imputation approaches for healthcare applications.
