Table of Contents
Fetching ...

DiffSF: Diffusion Models for Scene Flow Estimation

Yushan Zhang, Bastian Wandt, Maria Magnusson, Michael Felsberg

TL;DR

DiffSF reframes scene flow estimation as a conditional denoising diffusion process, enabling both high accuracy and per-point uncertainty estimates. By wiring a transformer-based backbone (inspired by GMSF) into a DDPM-style denoiser conditioned on source and target point clouds, it achieves state-of-the-art results on FlyingThings3D, KITTI, and Waymo Open benchmarks. The paper also demonstrates that multiple samples from the diffusion process yield reliable uncertainty signals that correlate with prediction errors, supporting safer deployment in autonomy contexts. Overall, DiffSF advances robust, uncertainty-aware scene flow estimation suitable for real-world robotics and autonomous systems.

Abstract

Scene flow estimation is an essential ingredient for a variety of real-world applications, especially for autonomous agents, such as self-driving cars and robots. While recent scene flow estimation approaches achieve a reasonable accuracy, their applicability to real-world systems additionally benefits from a reliability measure. Aiming at improving accuracy while additionally providing an estimate for uncertainty, we propose DiffSF that combines transformer-based scene flow estimation with denoising diffusion models. In the diffusion process, the ground truth scene flow vector field is gradually perturbed by adding Gaussian noise. In the reverse process, starting from randomly sampled Gaussian noise, the scene flow vector field prediction is recovered by conditioning on a source and a target point cloud. We show that the diffusion process greatly increases the robustness of predictions compared to prior approaches resulting in state-of-the-art performance on standard scene flow estimation benchmarks. Moreover, by sampling multiple times with different initial states, the denoising process predicts multiple hypotheses, which enables measuring the output uncertainty, allowing our approach to detect a majority of the inaccurate predictions. The code is available at https://github.com/ZhangYushan3/DiffSF.

DiffSF: Diffusion Models for Scene Flow Estimation

TL;DR

DiffSF reframes scene flow estimation as a conditional denoising diffusion process, enabling both high accuracy and per-point uncertainty estimates. By wiring a transformer-based backbone (inspired by GMSF) into a DDPM-style denoiser conditioned on source and target point clouds, it achieves state-of-the-art results on FlyingThings3D, KITTI, and Waymo Open benchmarks. The paper also demonstrates that multiple samples from the diffusion process yield reliable uncertainty signals that correlate with prediction errors, supporting safer deployment in autonomy contexts. Overall, DiffSF advances robust, uncertainty-aware scene flow estimation suitable for real-world robotics and autonomous systems.

Abstract

Scene flow estimation is an essential ingredient for a variety of real-world applications, especially for autonomous agents, such as self-driving cars and robots. While recent scene flow estimation approaches achieve a reasonable accuracy, their applicability to real-world systems additionally benefits from a reliability measure. Aiming at improving accuracy while additionally providing an estimate for uncertainty, we propose DiffSF that combines transformer-based scene flow estimation with denoising diffusion models. In the diffusion process, the ground truth scene flow vector field is gradually perturbed by adding Gaussian noise. In the reverse process, starting from randomly sampled Gaussian noise, the scene flow vector field prediction is recovered by conditioning on a source and a target point cloud. We show that the diffusion process greatly increases the robustness of predictions compared to prior approaches resulting in state-of-the-art performance on standard scene flow estimation benchmarks. Moreover, by sampling multiple times with different initial states, the denoising process predicts multiple hypotheses, which enables measuring the output uncertainty, allowing our approach to detect a majority of the inaccurate predictions. The code is available at https://github.com/ZhangYushan3/DiffSF.
Paper Structure (19 sections, 16 equations, 4 figures, 7 tables, 2 algorithms)

This paper contains 19 sections, 16 equations, 4 figures, 7 tables, 2 algorithms.

Figures (4)

  • Figure 1: Diffusion process. In the forward process, we start from a ground truth scene flow vector field $\mathbf{V}_0$ and gradually add noise to it until we reach $\mathbf{V}_T$, which is completely Gaussian noise. In the reverse process, we recover the scene flow vector field $\mathbf{V}_0$ from the randomly sampled noisy vector field $\mathbf{V}_T$ conditioned on the source point cloud $\mathbf{P}_\mathrm{source}$ and the target point cloud $\mathbf{P}_\mathrm{target}$.
  • Figure 2: The reverse process with detailed denoising block for scene flow estimation. The denoising block takes the current noisy input $\mathbf{V}_t$, the source point cloud $\mathbf{P}_\mathrm{source}$, and the target point cloud $\mathbf{P}_\mathrm{target}$ as input. The output $\hat{\mathbf{V}}_\mathrm{pred}$ is the denoised scene flow prediction. Shared weights for the feature extraction are indicated in the same color.
  • Figure 3: Analysis of uncertainty estimation on $\text{F3D}_\text{o}$ dataset. Left: Uncertainty-error correspondences. The horizontal axis is an interval of EPE. The vertical axis is the estimated uncertainty averaged over all the points that fall in the interval and the indication of the scaled uncertainty standard deviation. Right: Recall (red) and precision curve (blue) of outliers prediction. The horizontal axis is the threshold of the estimated uncertainty to determine the outliers.
  • Figure 4: Visualization of outlier prediction on $\text{F3D}_\text{o}$ dataset. Black: Accurate prediction. Red: Outliers. Top row: Outliers defined as EPE > 0.30. Bottom row: Outliers predicted by Uncertainty.