Table of Contents
Fetching ...

Modular Deep Learning for Multivariate Time-Series: Decoupling Imputation and Downstream Tasks

Joseph Arul Raj, Linglong Qian, Zina Ibrahim

TL;DR

The paper tackles missing values in multivariate time-series by advocating a modular design that decouples imputation from downstream tasks, enabling reuse of pretrained imputers and lightweight task heads. Using PyPOTS, it benchmarks six backbones across diverse datasets, showing modular pipelines can match or exceed end-to-end baselines, and substantially outperform them in low-label regimes. The study also demonstrates transferability by reusing backbones across datasets, and highlights reductions in retraining costs and improved robustness as key practical benefits. Overall, the modular approach provides a flexible, data-efficient, and scalable alternative for real-world time-series analysis without sacrificing accuracy.

Abstract

Missing values are pervasive in large-scale time-series data, posing challenges for reliable analysis and decision-making. Many neural architectures have been designed to model and impute the complex and heterogeneous missingness patterns of such data. Most existing methods are end-to-end, rendering imputation tightly coupled with downstream predictive tasks and leading to limited reusability of the trained model, reduced interpretability, and challenges in assessing model quality. In this paper, we call for a modular approach that decouples imputation and downstream tasks, enabling independent optimisation and greater adaptability. Using the largest open-source Python library for deep learning-based time-series analysis, PyPOTS, we evaluate a modular pipeline across six state-of-the-art models that perform imputation and prediction on seven datasets spanning multiple domains. Our results show that a modular approach maintains high performance while prioritising flexibility and reusability - qualities that are crucial for real-world applications. Through this work, we aim to demonstrate how modularity can benefit multivariate time-series analysis, achieving a balance between performance and adaptability.

Modular Deep Learning for Multivariate Time-Series: Decoupling Imputation and Downstream Tasks

TL;DR

The paper tackles missing values in multivariate time-series by advocating a modular design that decouples imputation from downstream tasks, enabling reuse of pretrained imputers and lightweight task heads. Using PyPOTS, it benchmarks six backbones across diverse datasets, showing modular pipelines can match or exceed end-to-end baselines, and substantially outperform them in low-label regimes. The study also demonstrates transferability by reusing backbones across datasets, and highlights reductions in retraining costs and improved robustness as key practical benefits. Overall, the modular approach provides a flexible, data-efficient, and scalable alternative for real-world time-series analysis without sacrificing accuracy.

Abstract

Missing values are pervasive in large-scale time-series data, posing challenges for reliable analysis and decision-making. Many neural architectures have been designed to model and impute the complex and heterogeneous missingness patterns of such data. Most existing methods are end-to-end, rendering imputation tightly coupled with downstream predictive tasks and leading to limited reusability of the trained model, reduced interpretability, and challenges in assessing model quality. In this paper, we call for a modular approach that decouples imputation and downstream tasks, enabling independent optimisation and greater adaptability. Using the largest open-source Python library for deep learning-based time-series analysis, PyPOTS, we evaluate a modular pipeline across six state-of-the-art models that perform imputation and prediction on seven datasets spanning multiple domains. Our results show that a modular approach maintains high performance while prioritising flexibility and reusability - qualities that are crucial for real-world applications. Through this work, we aim to demonstrate how modularity can benefit multivariate time-series analysis, achieving a balance between performance and adaptability.

Paper Structure

This paper contains 19 sections, 2 figures, 6 tables.

Figures (2)

  • Figure 1: The three phases of our modular pipeline.
  • Figure 2: Normalised inference-time overhead of end-to-end (E2E) architectures compared to modular baselines. The y-axis represents the ratio of E2E inference time to that of a single-layer modular MLP, with the dashed line at $1.0$ indicating parity.