DifFlow3D: Toward Robust Uncertainty-Aware Scene Flow Estimation with Diffusion Model
Jiuming Liu, Guangming Wang, Weicai Ye, Chaokang Jiang, Jinru Han, Zhe Liu, Guofeng Zhang, Dalong Du, Hesheng Wang
TL;DR
This work tackles 3D scene flow estimation by addressing unreliable correlations and coarse-to-fine refinement limitations in dynamic scenes. It introduces DifFlow3D, which uses a diffusion probabilistic model to iteratively refine a coarse flow into a dense, accurate prediction, guided by strong conditional signals from geometry, cost volume, and coarse embeddings, while simultaneously estimating per-point uncertainty. The approach demonstrates state-of-the-art performance on FlyingThings3D and KITTI, achieving reductions in $EPE3D$ of 24.0% and 29.1%, and millimeter-level KITTI accuracy, while remaining plug-and-play capable for other scene flow networks. The inclusion of per-point uncertainty improves reliability, and ablations confirm the importance of diffusion, conditioning signals, and uncertainty modeling, making this robust framework practical for real-world dynamic 3D perception tasks.
Abstract
Scene flow estimation, which aims to predict per-point 3D displacements of dynamic scenes, is a fundamental task in the computer vision field. However, previous works commonly suffer from unreliable correlation caused by locally constrained searching ranges, and struggle with accumulated inaccuracy arising from the coarse-to-fine structure. To alleviate these problems, we propose a novel uncertainty-aware scene flow estimation network (DifFlow3D) with the diffusion probabilistic model. Iterative diffusion-based refinement is designed to enhance the correlation robustness and resilience to challenging cases, e.g. dynamics, noisy inputs, repetitive patterns, etc. To restrain the generation diversity, three key flow-related features are leveraged as conditions in our diffusion model. Furthermore, we also develop an uncertainty estimation module within diffusion to evaluate the reliability of estimated scene flow. Our DifFlow3D achieves state-of-the-art performance, with 24.0% and 29.1% EPE3D reduction respectively on FlyingThings3D and KITTI 2015 datasets. Notably, our method achieves an unprecedented millimeter-level accuracy (0.0078m in EPE3D) on the KITTI dataset. Additionally, our diffusion-based refinement paradigm can be readily integrated as a plug-and-play module into existing scene flow networks, significantly increasing their estimation accuracy. Codes are released at https://github.com/IRMVLab/DifFlow3D.
