Table of Contents
Fetching ...

Diffusion-based Time Series Imputation and Forecasting with Structured State Space Models

Juan Miguel Lopez Alcaraz, Nils Strodthoff

TL;DR

Missing data in time-series poses a major barrier to reliable analysis and downstream tasks. The authors propose SSSD, a framework that fuses diffusion-based generative modeling with structured state-space models (notably the S4 layer) to capture long-range temporal dependencies for imputation and forecasting. They introduce variants (SSSD^S4, SSSD^SA, CSDI^S4) and training regimes that apply diffusion noise only to imputed regions (D1) with conditional information, and provide extensive experiments across ECG, aviation, electricity, and traffic datasets, showing strong performance, especially under blackout missingness. The work demonstrates that probabilistic imputations produced by SSSD are both qualitatively and quantitatively superior in challenging scenarios, and it offers a scalable path for conditioning on global or local signals with open-source code for replication and extension.

Abstract

The imputation of missing values represents a significant obstacle for many real-world data analysis pipelines. Here, we focus on time series data and put forward SSSD, an imputation model that relies on two emerging technologies, (conditional) diffusion models as state-of-the-art generative models and structured state space models as internal model architecture, which are particularly suited to capture long-term dependencies in time series data. We demonstrate that SSSD matches or even exceeds state-of-the-art probabilistic imputation and forecasting performance on a broad range of data sets and different missingness scenarios, including the challenging blackout-missing scenarios, where prior approaches failed to provide meaningful results.

Diffusion-based Time Series Imputation and Forecasting with Structured State Space Models

TL;DR

Missing data in time-series poses a major barrier to reliable analysis and downstream tasks. The authors propose SSSD, a framework that fuses diffusion-based generative modeling with structured state-space models (notably the S4 layer) to capture long-range temporal dependencies for imputation and forecasting. They introduce variants (SSSD^S4, SSSD^SA, CSDI^S4) and training regimes that apply diffusion noise only to imputed regions (D1) with conditional information, and provide extensive experiments across ECG, aviation, electricity, and traffic datasets, showing strong performance, especially under blackout missingness. The work demonstrates that probabilistic imputations produced by SSSD are both qualitatively and quantitatively superior in challenging scenarios, and it offers a scalable path for conditioning on global or local signals with open-source code for replication and extension.

Abstract

The imputation of missing values represents a significant obstacle for many real-world data analysis pipelines. Here, we focus on time series data and put forward SSSD, an imputation model that relies on two emerging technologies, (conditional) diffusion models as state-of-the-art generative models and structured state space models as internal model architecture, which are particularly suited to capture long-term dependencies in time series data. We demonstrate that SSSD matches or even exceeds state-of-the-art probabilistic imputation and forecasting performance on a broad range of data sets and different missingness scenarios, including the challenging blackout-missing scenarios, where prior approaches failed to provide meaningful results.
Paper Structure (52 sections, 12 equations, 13 figures, 29 tables, 2 algorithms)

This paper contains 52 sections, 12 equations, 13 figures, 29 tables, 2 algorithms.

Figures (13)

  • Figure 1: Color scheme introduction. The proposed model $\text{SSSD}^{\text{S4}}$ provides imputations for different missingness scenarios that are not only quantitatively but even qualitatively superior, see below, on different data sets for different missingness scenarios (RM: random missing, RBM: random block missing, BM: blackout missing TF: time series forecasting). The signal is in blue, where the white background represents the conditioned ground truth, whereas the gray background represents time steps in specific channels to be imputed. Prediction bands derived from 100 imputations represent quantiles from 0.05 to 0.95 in light green and from 0.25 to 0.75 in dark green. As these bands do not allow visually assessing the quality of individual imputations, we always additionally show a randomly selected single sample in orange.
  • Figure 2: Proposed $\text{SSSD}^{\text{S4}}$ model architecture.
  • Figure 3: PTB-XL BM imputations for the V5 lead of an ECG from a healthy patient.
  • Figure 4: PTB-XL TF for the V1 lead of an ECG from a patient with a complete left bundle branch block (CLBBB).
  • Figure 5: PTB-XL BM imputations for the leads I, V1, V4, and aVF of an ECG from a patient with sinus arrhythmia obtained from unconditional training (setting D).
  • ...and 8 more figures