Table of Contents
Fetching ...

RDPI: A Refine Diffusion Probability Generation Method for Spatiotemporal Data Imputation

Zijin Liu, Xiang Zhao, You Song

TL;DR

RDPI introduces a two-stage approach to spatiotemporal data imputation, combining a fast deterministic initializer with a residual-focused conditional diffusion model. By incorporating observed values into the forward diffusion process and treating the diffusion target as the residual between the initial imputation and true data, RDPI better captures spatiotemporal dependencies and uncertainty. Empirical results on traffic and air-quality datasets show RDPI achieves state-of-the-art imputation accuracy while substantially reducing sampling cost through accelerated diffusion. The method provides a principled probabilistic framework with robust performance across varying missing rates and configurations.

Abstract

Spatiotemporal data imputation plays a crucial role in various fields such as traffic flow monitoring, air quality assessment, and climate prediction. However, spatiotemporal data collected by sensors often suffer from temporal incompleteness, and the sparse and uneven distribution of sensors leads to missing data in the spatial dimension. Among existing methods, autoregressive approaches are prone to error accumulation, while simple conditional diffusion models fail to adequately capture the spatiotemporal relationships between observed and missing data. To address these issues, we propose a novel two-stage Refined Diffusion Probability Impuation (RDPI) framework based on an initial network and a conditional diffusion model. In the initial stage, deterministic imputation methods are used to generate preliminary estimates of the missing data. In the refinement stage, residuals are treated as the diffusion target, and observed values are innovatively incorporated into the forward process. This results in a conditional diffusion model better suited for spatiotemporal data imputation, bridging the gap between the preliminary estimates and the true values. Experiments on multiple datasets demonstrate that RDPI not only achieves state-of-the-art imputation accuracy but also significantly reduces sampling computational costs.

RDPI: A Refine Diffusion Probability Generation Method for Spatiotemporal Data Imputation

TL;DR

RDPI introduces a two-stage approach to spatiotemporal data imputation, combining a fast deterministic initializer with a residual-focused conditional diffusion model. By incorporating observed values into the forward diffusion process and treating the diffusion target as the residual between the initial imputation and true data, RDPI better captures spatiotemporal dependencies and uncertainty. Empirical results on traffic and air-quality datasets show RDPI achieves state-of-the-art imputation accuracy while substantially reducing sampling cost through accelerated diffusion. The method provides a principled probabilistic framework with robust performance across varying missing rates and configurations.

Abstract

Spatiotemporal data imputation plays a crucial role in various fields such as traffic flow monitoring, air quality assessment, and climate prediction. However, spatiotemporal data collected by sensors often suffer from temporal incompleteness, and the sparse and uneven distribution of sensors leads to missing data in the spatial dimension. Among existing methods, autoregressive approaches are prone to error accumulation, while simple conditional diffusion models fail to adequately capture the spatiotemporal relationships between observed and missing data. To address these issues, we propose a novel two-stage Refined Diffusion Probability Impuation (RDPI) framework based on an initial network and a conditional diffusion model. In the initial stage, deterministic imputation methods are used to generate preliminary estimates of the missing data. In the refinement stage, residuals are treated as the diffusion target, and observed values are innovatively incorporated into the forward process. This results in a conditional diffusion model better suited for spatiotemporal data imputation, bridging the gap between the preliminary estimates and the true values. Experiments on multiple datasets demonstrate that RDPI not only achieves state-of-the-art imputation accuracy but also significantly reduces sampling computational costs.

Paper Structure

This paper contains 33 sections, 36 equations, 6 figures, 6 tables, 3 algorithms.

Figures (6)

  • Figure 1: The directed graphical model of RDPI framework. In the initial stage, the rough imputation result $z_0^m$ is computed from the observed values $x_0^c$ and the missing data $x_0^m$ using a deterministic model $f_{\theta}$. In the refine stage, a novel conditional diffusion model is introduced to generate the residual ${z}_0^m$ between the rough imputation result $f_{\theta}(x_0^c)$ and the true values $x_0^m$, ultimately yielding a refined imputation result for the missing data. In this figure, $z_t^m$ correspond to representation of residual at $t$-th step of diffusion process, $z_0^c$ is the representation of $x_0^c$ obtained from $f_{\theta}$, $\widetilde{x}_0^m$ is the ultimate imputation result, $\mathcal{N}(\mathbf{0}, \mathbf{I})$ correspond the Standard Gaussian.
  • Figure 2: Architecture of denoising model.
  • Figure 3: Imputation results under different missing rates.
  • Figure 4: Imputation examples of RDPI on the AQI36 dataset. The horizontal axis represents time, and the vertical axis represents value.
  • Figure 5: The imputation for unobserved sensors in AQI36. The blue dotted line represents the truth of the ground, and the green solid line represents the deterministic imputation result
  • ...and 1 more figures