Beyond Accuracy: An Empirical Study of Uncertainty Estimation in Imputation

Zarin Tahia Hossain; Mostafa Milani

Beyond Accuracy: An Empirical Study of Uncertainty Estimation in Imputation

Zarin Tahia Hossain, Mostafa Milani

TL;DR

This study tackles uncertainty estimation in data imputation by empirically evaluating six representative imputers (MICE, SoftImpute, OT-Impute, GAIN, MIWAE, TabCSDI) under MCAR, MAR, and MNAR missingness. Uncertainty is extracted via multi-run, conditional sampling, and predictive-distribution modeling, with evaluation based on calibration curves and $ECE$ in addition to $MAE$. Findings reveal a persistent misalignment between reconstruction accuracy and uncertainty calibration: some high-accuracy methods yield poorly calibrated uncertainty, while others provide better-calibrated predictions at a cost to point accuracy. The results offer practical guidance on selecting uncertainty-aware imputers depending on the application's emphasis on reliability versus speed, and highlight the need for calibrated uncertainty in data cleaning and downstream ML pipelines. This work establishes a unified framework for comparing uncertainty in imputation across diverse methodological families and missingness regimes.

Abstract

Handling missing data is a central challenge in data-driven analysis. Modern imputation methods not only aim for accurate reconstruction but also differ in how they represent and quantify uncertainty. Yet, the reliability and calibration of these uncertainty estimates remain poorly understood. This paper presents a systematic empirical study of uncertainty in imputation, comparing representative methods from three major families: statistical (MICE, SoftImpute), distribution alignment (OT-Impute), and deep generative (GAIN, MIWAE, TabCSDI). Experiments span multiple datasets, missingness mechanisms (MCAR, MAR, MNAR), and missingness rates. Uncertainty is estimated through three complementary routes: multi-run variability, conditional sampling, and predictive-distribution modeling, and evaluated using calibration curves and the Expected Calibration Error (ECE). Results show that accuracy and calibration are often misaligned: models with high reconstruction accuracy do not necessarily yield reliable uncertainty. We analyze method-specific trade-offs among accuracy, calibration, and runtime, identify stable configurations, and offer guidelines for selecting uncertainty-aware imputers in data cleaning and downstream machine learning pipelines.

Beyond Accuracy: An Empirical Study of Uncertainty Estimation in Imputation

TL;DR

Abstract

Beyond Accuracy: An Empirical Study of Uncertainty Estimation in Imputation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)