Table of Contents
Fetching ...

STDiff: A State Transition Diffusion Framework for Time Series Imputation in Industrial Systems

Gary Simethy, Daniel Ortiz-Arroyo, Petar Durdevic

TL;DR

STDiff and STDiff-W introduce diffusion-based imputers that treat gaps in industrial time series as state-space rollouts conditioned on past states, control actions, and exogenous signals. STDiff provides one-step transitions, while STDiff-W adds a context-window to jointly denoise blocks, addressing long outages with improved long-range consistency and local detail. Evaluations on Agtrup and Avedøre WWTP data show state-of-the-art imputation accuracy and strong downstream forecasting, with ablations confirming the importance of exogenous conditioning and realistic missingness. The work emphasizes task-oriented evaluation, practical deployment considerations, and the value of combining dynamics-aware conditioning with diffusion in industrial settings.

Abstract

Incomplete sensor data is a major obstacle in industrial time-series analytics. In wastewater treatment plants (WWTPs), key sensors show long, irregular gaps caused by fouling, maintenance, and outages. We introduce STDiff and STDiff-W, diffusion-based imputers that cast gap filling as state-space simulation under partial observability, where targets, controls, and exogenous signals may all be intermittently missing. STDiff learns a one-step transition model conditioned on observed values and masks, while STDiff-W extends this with a context encoder that jointly inpaints contiguous blocks, combining long-range consistency with short-term detail. On two WWTP datasets (one with synthetic block gaps from Agtrup and another with natural outages from Avedøre), STDiff-W achieves state-of-the-art accuracy compared with strong neural baselines such as SAITS, BRITS, and CSDI. Beyond point-error metrics, its reconstructions preserve realistic dynamics including oscillations, spikes, and regime shifts, and they achieve top or tied-top downstream one-step forecasting performance compared with strong neural baselines, indicating that preserving dynamics does not come at the expense of predictive utility. Ablation studies that drop, shuffle, or add noise to control or exogenous inputs consistently degrade NH4 and PO4 performance, with the largest deterioration observed when exogenous signals are removed, showing that the model captures meaningful dependencies. We conclude with practical guidance for deployment: evaluate performance beyond MAE using task-oriented and visual checks, include exogenous drivers, and balance computational cost against robustness to structured outages.

STDiff: A State Transition Diffusion Framework for Time Series Imputation in Industrial Systems

TL;DR

STDiff and STDiff-W introduce diffusion-based imputers that treat gaps in industrial time series as state-space rollouts conditioned on past states, control actions, and exogenous signals. STDiff provides one-step transitions, while STDiff-W adds a context-window to jointly denoise blocks, addressing long outages with improved long-range consistency and local detail. Evaluations on Agtrup and Avedøre WWTP data show state-of-the-art imputation accuracy and strong downstream forecasting, with ablations confirming the importance of exogenous conditioning and realistic missingness. The work emphasizes task-oriented evaluation, practical deployment considerations, and the value of combining dynamics-aware conditioning with diffusion in industrial settings.

Abstract

Incomplete sensor data is a major obstacle in industrial time-series analytics. In wastewater treatment plants (WWTPs), key sensors show long, irregular gaps caused by fouling, maintenance, and outages. We introduce STDiff and STDiff-W, diffusion-based imputers that cast gap filling as state-space simulation under partial observability, where targets, controls, and exogenous signals may all be intermittently missing. STDiff learns a one-step transition model conditioned on observed values and masks, while STDiff-W extends this with a context encoder that jointly inpaints contiguous blocks, combining long-range consistency with short-term detail. On two WWTP datasets (one with synthetic block gaps from Agtrup and another with natural outages from Avedøre), STDiff-W achieves state-of-the-art accuracy compared with strong neural baselines such as SAITS, BRITS, and CSDI. Beyond point-error metrics, its reconstructions preserve realistic dynamics including oscillations, spikes, and regime shifts, and they achieve top or tied-top downstream one-step forecasting performance compared with strong neural baselines, indicating that preserving dynamics does not come at the expense of predictive utility. Ablation studies that drop, shuffle, or add noise to control or exogenous inputs consistently degrade NH4 and PO4 performance, with the largest deterioration observed when exogenous signals are removed, showing that the model captures meaningful dependencies. We conclude with practical guidance for deployment: evaluate performance beyond MAE using task-oriented and visual checks, include exogenous drivers, and balance computational cost against robustness to structured outages.

Paper Structure

This paper contains 65 sections, 5 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Architecture of STDiff during training. A 1D U-Net denoiser receives a noised version of the next state $x_t^{(k)}$ and predicts the added Gaussian noise, conditioned on the diffusion step $k$ and a context embedding of the previous state, control inputs, and exogenous variables. This trains a one-step transition model $p(x_t \mid x_{t-1}, u_t, w_t)$ that can later be used for sequential imputation.
  • Figure 2: Imputation with STDiff at inference time. Starting from a Gaussian noise sample for the next state, the model iteratively denoises over the diffusion steps ($k = T \rightarrow 1$) while clamping any observed entries, yielding an imputed next state that respects available measurements. Repeating this procedure step by step through a gap produces a sequential rollout that fills arbitrarily long outages.
  • Figure 3: Architecture of STDiff-W during training. A causal temporal convolutional network (TCN) encodes the past $K$ steps of states, masks, and (optionally) $\Delta t$ into a context vector that summarizes recent dynamics. This context conditions a diffusion U-Net that jointly denoises a block of $H$ future steps, learning a blockwise transition model that mitigates error accumulation compared with pure one-step rollouts.
  • Figure 4: Blockwise imputation with STDiff-W. Given a gap of $H$ missing steps, the model samples an initial noisy window and iteratively denoises it while conditioning on the recent context and clamping any known values inside the block. For longer gaps, the context window is slid forward and successive blocks are imputed and stitched together, combining long-range consistency with locally coherent inpainting.
  • Figure 5: N$_2$O concentration at Avedø re WWTP during natural sensor outages in winter/spring (left) and summer/fall (right). Each row shows the same missing segments reconstructed by a different model, illustrating how methods differ in preserving oscillations, spikes, and regime changes under long, irregular gaps.
  • ...and 1 more figures