STDiff: A State Transition Diffusion Framework for Time Series Imputation in Industrial Systems
Gary Simethy, Daniel Ortiz-Arroyo, Petar Durdevic
TL;DR
STDiff and STDiff-W introduce diffusion-based imputers that treat gaps in industrial time series as state-space rollouts conditioned on past states, control actions, and exogenous signals. STDiff provides one-step transitions, while STDiff-W adds a context-window to jointly denoise blocks, addressing long outages with improved long-range consistency and local detail. Evaluations on Agtrup and Avedøre WWTP data show state-of-the-art imputation accuracy and strong downstream forecasting, with ablations confirming the importance of exogenous conditioning and realistic missingness. The work emphasizes task-oriented evaluation, practical deployment considerations, and the value of combining dynamics-aware conditioning with diffusion in industrial settings.
Abstract
Incomplete sensor data is a major obstacle in industrial time-series analytics. In wastewater treatment plants (WWTPs), key sensors show long, irregular gaps caused by fouling, maintenance, and outages. We introduce STDiff and STDiff-W, diffusion-based imputers that cast gap filling as state-space simulation under partial observability, where targets, controls, and exogenous signals may all be intermittently missing. STDiff learns a one-step transition model conditioned on observed values and masks, while STDiff-W extends this with a context encoder that jointly inpaints contiguous blocks, combining long-range consistency with short-term detail. On two WWTP datasets (one with synthetic block gaps from Agtrup and another with natural outages from Avedøre), STDiff-W achieves state-of-the-art accuracy compared with strong neural baselines such as SAITS, BRITS, and CSDI. Beyond point-error metrics, its reconstructions preserve realistic dynamics including oscillations, spikes, and regime shifts, and they achieve top or tied-top downstream one-step forecasting performance compared with strong neural baselines, indicating that preserving dynamics does not come at the expense of predictive utility. Ablation studies that drop, shuffle, or add noise to control or exogenous inputs consistently degrade NH4 and PO4 performance, with the largest deterioration observed when exogenous signals are removed, showing that the model captures meaningful dependencies. We conclude with practical guidance for deployment: evaluate performance beyond MAE using task-oriented and visual checks, include exogenous drivers, and balance computational cost against robustness to structured outages.
