Table of Contents
Fetching ...

Prior Does Matter: Visual Navigation via Denoising Diffusion Bridge Models

Hao Ren, Yiming Zeng, Zetong Bi, Zhaoliang Wan, Junlong Huang, Hui Cheng

TL;DR

This work tackles visual navigation with diffusion policies that traditionally denoise from Gaussian noise. It proposes NaviBridger, a denoising diffusion bridge model that starts from informative priors to guide action generation toward target trajectories using the Doob's $h$-transform. A theoretical framework links the quality of the source distribution to improved target-action denoising, and three prior strategies—Gaussian, rule-based, and learning-based (CVAE)—are analyzed and instantiated. Empirical results across simulated and real-world indoor/outdoor tasks show faster, more accurate action generation and higher success rates with NaviBridger, especially when using learning-based priors, while also demonstrating robustness to environment changes. The codebase is released to enable replication and further exploration of diffusion-bridge imitation learning for navigation.

Abstract

Recent advancements in diffusion-based imitation learning, which show impressive performance in modeling multimodal distributions and training stability, have led to substantial progress in various robot learning tasks. In visual navigation, previous diffusion-based policies typically generate action sequences by initiating from denoising Gaussian noise. However, the target action distribution often diverges significantly from Gaussian noise, leading to redundant denoising steps and increased learning complexity. Additionally, the sparsity of effective action distributions makes it challenging for the policy to generate accurate actions without guidance. To address these issues, we propose a novel, unified visual navigation framework leveraging the denoising diffusion bridge models named NaviBridger. This approach enables action generation by initiating from any informative prior actions, enhancing guidance and efficiency in the denoising process. We explore how diffusion bridges can enhance imitation learning in visual navigation tasks and further examine three source policies for generating prior actions. Extensive experiments in both simulated and real-world indoor and outdoor scenarios demonstrate that NaviBridger accelerates policy inference and outperforms the baselines in generating target action sequences. Code is available at https://github.com/hren20/NaiviBridger.

Prior Does Matter: Visual Navigation via Denoising Diffusion Bridge Models

TL;DR

This work tackles visual navigation with diffusion policies that traditionally denoise from Gaussian noise. It proposes NaviBridger, a denoising diffusion bridge model that starts from informative priors to guide action generation toward target trajectories using the Doob's -transform. A theoretical framework links the quality of the source distribution to improved target-action denoising, and three prior strategies—Gaussian, rule-based, and learning-based (CVAE)—are analyzed and instantiated. Empirical results across simulated and real-world indoor/outdoor tasks show faster, more accurate action generation and higher success rates with NaviBridger, especially when using learning-based priors, while also demonstrating robustness to environment changes. The codebase is released to enable replication and further exploration of diffusion-bridge imitation learning for navigation.

Abstract

Recent advancements in diffusion-based imitation learning, which show impressive performance in modeling multimodal distributions and training stability, have led to substantial progress in various robot learning tasks. In visual navigation, previous diffusion-based policies typically generate action sequences by initiating from denoising Gaussian noise. However, the target action distribution often diverges significantly from Gaussian noise, leading to redundant denoising steps and increased learning complexity. Additionally, the sparsity of effective action distributions makes it challenging for the policy to generate accurate actions without guidance. To address these issues, we propose a novel, unified visual navigation framework leveraging the denoising diffusion bridge models named NaviBridger. This approach enables action generation by initiating from any informative prior actions, enhancing guidance and efficiency in the denoising process. We explore how diffusion bridges can enhance imitation learning in visual navigation tasks and further examine three source policies for generating prior actions. Extensive experiments in both simulated and real-world indoor and outdoor scenarios demonstrate that NaviBridger accelerates policy inference and outperforms the baselines in generating target action sequences. Code is available at https://github.com/hren20/NaiviBridger.

Paper Structure

This paper contains 19 sections, 35 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Visual representation overview of the local path generation process. The top row shows the actions generated through vanilla diffusion models denoising from Gaussian noise. The bottom row illustrates actions generated using proposed NaviBridger, which leverage prior information to effectively guide the denoising bridge model towards the target action. This comparison demonstrates the importance of incorporating prior knowledge in achieving stable and accurate navigation.
  • Figure 2: NaviBridger Overview. The policy takes RGB observations and a goal image, extracting features via a Transformer encoder. The FiLM condition module applies learned conditions to improve path accuracy, while the prior generation module ensures the policy outputs better aligns with target actions, starting from alternative prior actions generated using optional prior policies.
  • Figure 3: DDBM trained on 2D synthetic data. The leftmost side represents the source distribution, and the rightmost side represents the target distribution. The distance between the source distribution and the target distribution increases from top to bottom. As the EMD increases, the required approximation steps increase while model performance progressively declines.
  • Figure 4: Comparison of navigation performance in simulation and real-world environments. NaviBridger with rule-based and model-based priors (DDBM) generates smoother, more stable paths compared to NoMaD (DDPM), demonstrating improved alignment with the target trajectory over denoising steps.
  • Figure 5: Success rate and inference time comparison across denoising steps. Learning-based and Gaussian priors in DDBM show higher success rates than DDPM, with learning-based priors achieving the best performance. DDBM also demonstrates faster inference times as denoising steps increase.
  • ...and 4 more figures