RDPI: A Refine Diffusion Probability Generation Method for Spatiotemporal Data Imputation
Zijin Liu, Xiang Zhao, You Song
TL;DR
RDPI introduces a two-stage approach to spatiotemporal data imputation, combining a fast deterministic initializer with a residual-focused conditional diffusion model. By incorporating observed values into the forward diffusion process and treating the diffusion target as the residual between the initial imputation and true data, RDPI better captures spatiotemporal dependencies and uncertainty. Empirical results on traffic and air-quality datasets show RDPI achieves state-of-the-art imputation accuracy while substantially reducing sampling cost through accelerated diffusion. The method provides a principled probabilistic framework with robust performance across varying missing rates and configurations.
Abstract
Spatiotemporal data imputation plays a crucial role in various fields such as traffic flow monitoring, air quality assessment, and climate prediction. However, spatiotemporal data collected by sensors often suffer from temporal incompleteness, and the sparse and uneven distribution of sensors leads to missing data in the spatial dimension. Among existing methods, autoregressive approaches are prone to error accumulation, while simple conditional diffusion models fail to adequately capture the spatiotemporal relationships between observed and missing data. To address these issues, we propose a novel two-stage Refined Diffusion Probability Impuation (RDPI) framework based on an initial network and a conditional diffusion model. In the initial stage, deterministic imputation methods are used to generate preliminary estimates of the missing data. In the refinement stage, residuals are treated as the diffusion target, and observed values are innovatively incorporated into the forward process. This results in a conditional diffusion model better suited for spatiotemporal data imputation, bridging the gap between the preliminary estimates and the true values. Experiments on multiple datasets demonstrate that RDPI not only achieves state-of-the-art imputation accuracy but also significantly reduces sampling computational costs.
