Table of Contents
Fetching ...

DifFlow3D: Toward Robust Uncertainty-Aware Scene Flow Estimation with Diffusion Model

Jiuming Liu, Guangming Wang, Weicai Ye, Chaokang Jiang, Jinru Han, Zhe Liu, Guofeng Zhang, Dalong Du, Hesheng Wang

TL;DR

This work tackles 3D scene flow estimation by addressing unreliable correlations and coarse-to-fine refinement limitations in dynamic scenes. It introduces DifFlow3D, which uses a diffusion probabilistic model to iteratively refine a coarse flow into a dense, accurate prediction, guided by strong conditional signals from geometry, cost volume, and coarse embeddings, while simultaneously estimating per-point uncertainty. The approach demonstrates state-of-the-art performance on FlyingThings3D and KITTI, achieving reductions in $EPE3D$ of 24.0% and 29.1%, and millimeter-level KITTI accuracy, while remaining plug-and-play capable for other scene flow networks. The inclusion of per-point uncertainty improves reliability, and ablations confirm the importance of diffusion, conditioning signals, and uncertainty modeling, making this robust framework practical for real-world dynamic 3D perception tasks.

Abstract

Scene flow estimation, which aims to predict per-point 3D displacements of dynamic scenes, is a fundamental task in the computer vision field. However, previous works commonly suffer from unreliable correlation caused by locally constrained searching ranges, and struggle with accumulated inaccuracy arising from the coarse-to-fine structure. To alleviate these problems, we propose a novel uncertainty-aware scene flow estimation network (DifFlow3D) with the diffusion probabilistic model. Iterative diffusion-based refinement is designed to enhance the correlation robustness and resilience to challenging cases, e.g. dynamics, noisy inputs, repetitive patterns, etc. To restrain the generation diversity, three key flow-related features are leveraged as conditions in our diffusion model. Furthermore, we also develop an uncertainty estimation module within diffusion to evaluate the reliability of estimated scene flow. Our DifFlow3D achieves state-of-the-art performance, with 24.0% and 29.1% EPE3D reduction respectively on FlyingThings3D and KITTI 2015 datasets. Notably, our method achieves an unprecedented millimeter-level accuracy (0.0078m in EPE3D) on the KITTI dataset. Additionally, our diffusion-based refinement paradigm can be readily integrated as a plug-and-play module into existing scene flow networks, significantly increasing their estimation accuracy. Codes are released at https://github.com/IRMVLab/DifFlow3D.

DifFlow3D: Toward Robust Uncertainty-Aware Scene Flow Estimation with Diffusion Model

TL;DR

This work tackles 3D scene flow estimation by addressing unreliable correlations and coarse-to-fine refinement limitations in dynamic scenes. It introduces DifFlow3D, which uses a diffusion probabilistic model to iteratively refine a coarse flow into a dense, accurate prediction, guided by strong conditional signals from geometry, cost volume, and coarse embeddings, while simultaneously estimating per-point uncertainty. The approach demonstrates state-of-the-art performance on FlyingThings3D and KITTI, achieving reductions in of 24.0% and 29.1%, and millimeter-level KITTI accuracy, while remaining plug-and-play capable for other scene flow networks. The inclusion of per-point uncertainty improves reliability, and ablations confirm the importance of diffusion, conditioning signals, and uncertainty modeling, making this robust framework practical for real-world dynamic 3D perception tasks.

Abstract

Scene flow estimation, which aims to predict per-point 3D displacements of dynamic scenes, is a fundamental task in the computer vision field. However, previous works commonly suffer from unreliable correlation caused by locally constrained searching ranges, and struggle with accumulated inaccuracy arising from the coarse-to-fine structure. To alleviate these problems, we propose a novel uncertainty-aware scene flow estimation network (DifFlow3D) with the diffusion probabilistic model. Iterative diffusion-based refinement is designed to enhance the correlation robustness and resilience to challenging cases, e.g. dynamics, noisy inputs, repetitive patterns, etc. To restrain the generation diversity, three key flow-related features are leveraged as conditions in our diffusion model. Furthermore, we also develop an uncertainty estimation module within diffusion to evaluate the reliability of estimated scene flow. Our DifFlow3D achieves state-of-the-art performance, with 24.0% and 29.1% EPE3D reduction respectively on FlyingThings3D and KITTI 2015 datasets. Notably, our method achieves an unprecedented millimeter-level accuracy (0.0078m in EPE3D) on the KITTI dataset. Additionally, our diffusion-based refinement paradigm can be readily integrated as a plug-and-play module into existing scene flow networks, significantly increasing their estimation accuracy. Codes are released at https://github.com/IRMVLab/DifFlow3D.
Paper Structure (16 sections, 15 equations, 7 figures, 5 tables)

This paper contains 16 sections, 15 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Comparison on challenging cases. DifFlow3D predicts uncertainty-aware scene flow with diffusion model, which has stronger robustness for: (a) dynamics, (b) noisy inputs, (c) small objects, and (d) repetitive patterns. Blue, green, red points respectively indicate the first frame $PC_1$, accurately estimated $PC_2$ ($PC_1$ warped by estimated flow), and inaccurately estimated $PC_2$.
  • Figure 2: An illustration of our diffusion for scene flow estimation. During the forward process, we progressively add Gaussian noise on the ground truth flow residual ($s_{0}$). A neural network $M_{\theta}(\cdot, \cdot, \cdot)$ is trained to denoise the noisy flow residual $s_{t}$ at time $t$ based on condition information $C$.
  • Figure 3: The overall structure of DifFlow3D. We first initialize a coarse sparse scene flow in the bottom layer. Then, iterative diffusion-based refinement layers with flow-related condition signals are applied to recover the denser flow residuals. A per-point uncertainty is also predicted jointly with scene flow to evaluate the reliability of our estimated flow.
  • Figure 4: The visualization of uncertainty. During the training process, our designed uncertainty intervals narrow progressively, which encourages predicted flow toward the ground truth.
  • Figure 5: Visualization results w/o or with our Diffusion-based Scene Flow Refinement (DSFR). For better comparison, we only visualize the estimated $PC_2$ by warping $PC_1$ with estimated scene flow. green, red points respectively indicate accurately estimated $PC_2$ and inaccurately estimated $PC_2$(measured by Acc3DR).
  • ...and 2 more figures