Table of Contents
Fetching ...

ImputeGAP: A Comprehensive Library for Time Series Imputation

Quentin Nater, Mourad Khayati

TL;DR

The paper tackles missing-value imputation for IoT-derived time series, emphasizing the need to model realistic missingness and its effect on downstream tasks. It introduces ImputeGAP, an end-to-end library that unifies multiple imputation families with a configurable contaminator, enabling realistic mono-block and multi-block gaps. It also provides benchmarking, explainability via SHAP, and downstream evaluation tools to assess impact on forecasting and other analyses. The work offers a practical, extensible platform for pre-processing time series and evaluating imputation strategies in real-world pipelines.

Abstract

With the prevalence of sensor failures, imputation, the process of estimating missing values, has emerged as the cornerstone of time series data pre-processing. While numerous imputation algorithms have been developed to repair these data gaps, existing time series libraries provide limited imputation support. Furthermore, they often lack the ability to simulate realistic time series missingness patterns and fail to account for the impact of the imputed data on subsequent downstream analysis. This paper introduces ImputeGAP, a comprehensive library for time series imputation that supports a diverse range of imputation methods and modular missing data simulation, catering to datasets with varying characteristics. The library includes extensive customization options, such as automated hyperparameter tuning, benchmarking, explainability, downstream evaluation, and compatibility with popular time series frameworks.

ImputeGAP: A Comprehensive Library for Time Series Imputation

TL;DR

The paper tackles missing-value imputation for IoT-derived time series, emphasizing the need to model realistic missingness and its effect on downstream tasks. It introduces ImputeGAP, an end-to-end library that unifies multiple imputation families with a configurable contaminator, enabling realistic mono-block and multi-block gaps. It also provides benchmarking, explainability via SHAP, and downstream evaluation tools to assess impact on forecasting and other analyses. The work offers a practical, extensible platform for pre-processing time series and evaluating imputation strategies in real-world pipelines.

Abstract

With the prevalence of sensor failures, imputation, the process of estimating missing values, has emerged as the cornerstone of time series data pre-processing. While numerous imputation algorithms have been developed to repair these data gaps, existing time series libraries provide limited imputation support. Furthermore, they often lack the ability to simulate realistic time series missingness patterns and fail to account for the impact of the imputed data on subsequent downstream analysis. This paper introduces ImputeGAP, a comprehensive library for time series imputation that supports a diverse range of imputation methods and modular missing data simulation, catering to datasets with varying characteristics. The library includes extensive customization options, such as automated hyperparameter tuning, benchmarking, explainability, downstream evaluation, and compatibility with popular time series frameworks.

Paper Structure

This paper contains 5 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: The ImputeGAP Framework.