Evaluation of Missing Data Imputation for Time Series Without Ground Truth
Rania Farjallah, Bassant Selim, Brigitte Jaumard, Samr Ali, Georges Kaddoum
TL;DR
This paper tackles missing data in time-series for 5G network management where ground-truth values are unavailable. It introduces WD and JSD as distribution-based, no-ground-truth validation metrics to assess imputation quality. The metrics are validated by artificial gaps in complete Telraam and Madrid datasets and compared against RMSE/MAE using various imputers (Interpolation, ARIMA, SARIMA, XGBoost, LSTM). Results show WD and JSD align with traditional metrics and reveal the relative strengths of the tested imputers (e.g., XGBoost and LSTM), supporting their use for real-world validation when ground truth is scarce. The findings have practical impact for deploying robust imputation in 5G network analytics in scenarios lacking ground-truth data.
Abstract
The challenge of handling missing data in time series is critical for maintaining the accuracy and reliability of machine learning (ML) models in applications like fifth generation mobile communication (5G) network management. Traditional methods for validating imputation rely on ground truth data, which is inherently unavailable. This paper addresses this limitation by introducing two statistical metrics, the wasserstein distance (WD) and jensen-shannon divergence (JSD), to evaluate imputation quality without requiring ground truth. These metrics assess the alignment between the distributions of imputed and original data, providing a robust method for evaluating imputation performance based on internal structure and data consistency. We apply and test these metrics across several imputation techniques. Results demonstrate that WD and JSD are effective metrics for assessing the quality of missing data imputation, particularly in scenarios where ground truth data is unavailable.
