Diffusion-based Time Series Imputation and Forecasting with Structured State Space Models
Juan Miguel Lopez Alcaraz, Nils Strodthoff
TL;DR
Missing data in time-series poses a major barrier to reliable analysis and downstream tasks. The authors propose SSSD, a framework that fuses diffusion-based generative modeling with structured state-space models (notably the S4 layer) to capture long-range temporal dependencies for imputation and forecasting. They introduce variants (SSSD^S4, SSSD^SA, CSDI^S4) and training regimes that apply diffusion noise only to imputed regions (D1) with conditional information, and provide extensive experiments across ECG, aviation, electricity, and traffic datasets, showing strong performance, especially under blackout missingness. The work demonstrates that probabilistic imputations produced by SSSD are both qualitatively and quantitatively superior in challenging scenarios, and it offers a scalable path for conditioning on global or local signals with open-source code for replication and extension.
Abstract
The imputation of missing values represents a significant obstacle for many real-world data analysis pipelines. Here, we focus on time series data and put forward SSSD, an imputation model that relies on two emerging technologies, (conditional) diffusion models as state-of-the-art generative models and structured state space models as internal model architecture, which are particularly suited to capture long-term dependencies in time series data. We demonstrate that SSSD matches or even exceeds state-of-the-art probabilistic imputation and forecasting performance on a broad range of data sets and different missingness scenarios, including the challenging blackout-missing scenarios, where prior approaches failed to provide meaningful results.
