Table of Contents
Fetching ...

Denoising Heat-inspired Diffusion with Insulators for Collision Free Motion Planning

Junwoo Chang, Hyunwoo Ryu, Jiwoo Kim, Soochul Yoo, Jongeun Choi, Joohwan Seo, Nikhil Prakash, Roberto Horowitz

TL;DR

This work addresses collision-free motion planning from minimal visual input by introducing a collision-avoiding diffusion kernel inspired by heat conduction with insulators. A score-based diffusion model is trained to approximate target scores derived from a heat-equation-based diffusion kernel, enabling end-to-end generation of reachable goals and collision-free state sequences from a single top-down image using annealed Langevin dynamics. The key contribution is the heat-inspired kernel that encodes obstacle avoidance directly into diffusion, allowing inference-time planning without explicit obstacle sensing or additional hardware, and yielding robust multi-modal goal generation. Empirical results against baselines demonstrate strong performance in uni- and multi-modal scenarios while avoiding unreachable goals, highlighting practical impact for real-world robotic planning under limited sensing.

Abstract

Diffusion models have risen as a powerful tool in robotics due to their flexibility and multi-modality. While some of these methods effectively address complex problems, they often depend heavily on inference-time obstacle detection and require additional equipment. Addressing these challenges, we present a method that, during inference time, simultaneously generates only reachable goals and plans motions that avoid obstacles, all from a single visual input. Central to our approach is the novel use of a collision-avoiding diffusion kernel for training. Through evaluations against behavior-cloning and classical diffusion models, our framework has proven its robustness. It is particularly effective in multi-modal environments, navigating toward goals and avoiding unreachable ones blocked by obstacles, while ensuring collision avoidance. Project Website: https://sites.google.com/view/denoising-heat-inspired

Denoising Heat-inspired Diffusion with Insulators for Collision Free Motion Planning

TL;DR

This work addresses collision-free motion planning from minimal visual input by introducing a collision-avoiding diffusion kernel inspired by heat conduction with insulators. A score-based diffusion model is trained to approximate target scores derived from a heat-equation-based diffusion kernel, enabling end-to-end generation of reachable goals and collision-free state sequences from a single top-down image using annealed Langevin dynamics. The key contribution is the heat-inspired kernel that encodes obstacle avoidance directly into diffusion, allowing inference-time planning without explicit obstacle sensing or additional hardware, and yielding robust multi-modal goal generation. Empirical results against baselines demonstrate strong performance in uni- and multi-modal scenarios while avoiding unreachable goals, highlighting practical impact for real-world robotic planning under limited sensing.

Abstract

Diffusion models have risen as a powerful tool in robotics due to their flexibility and multi-modality. While some of these methods effectively address complex problems, they often depend heavily on inference-time obstacle detection and require additional equipment. Addressing these challenges, we present a method that, during inference time, simultaneously generates only reachable goals and plans motions that avoid obstacles, all from a single visual input. Central to our approach is the novel use of a collision-avoiding diffusion kernel for training. Through evaluations against behavior-cloning and classical diffusion models, our framework has proven its robustness. It is particularly effective in multi-modal environments, navigating toward goals and avoiding unreachable ones blocked by obstacles, while ensuring collision avoidance. Project Website: https://sites.google.com/view/denoising-heat-inspired
Paper Structure (17 sections, 7 equations, 6 figures, 1 table, 3 algorithms)

This paper contains 17 sections, 7 equations, 6 figures, 1 table, 3 algorithms.

Figures (6)

  • Figure 1: Collision avoiding diffusion kernel. The comparison clearly shows the differences between two diffusion kernels in an environment with obstacles. (a) The collision-avoiding diffusion kernel moves without invading any obstacles. (b) In contrast, the Gaussian diffusion kernel often runs into obstacles, indicating a higher risk of collisions.
  • Figure 2: Architecture and overview of our method. The model processes the visual input $y$ and time $t$ to produce an output score field, which is then bilinearly interpolated with the input state $x_t$ to obtain the score value at $x_t$. To determine the next state leading to the goal, we employ annealed Langevin dynamics sampling.
  • Figure 3: Experiment results of our method. In the $64\times 64$ input image, black areas indicate obstacles, red dots illustrate states originating from the initial distribution, and green apples mark the goals. (a) The first row demonstrates an experiment with two multi-modal goals generated. It proves the multi-modality of our model by moving the states toward each goal. (b) The second row presents a similar setup, but with one goal being unreachable. It only generates the reachable goal without any collisions with the obstacle.
  • Figure 4: Graphical analysis of experiment results. (a): success rate of reaching goal distributions. (b) KL divergence between the goal distributions and the empirical distribution. Both indicates that our method shows robust and good performance in all scenarios, compared to baselines.
  • Figure 5: Performance evaluation of the BC model in terms of multi-modality. Black areas denote obstacles, red dots represent states initiated from the initial distribution, and green apples mark the goals. (a) The first row presents a scenario with one reachable goal generated. (b) The second row displays a test similar to the first but with multi-modal goals generated. It indicates that the behavior-cloning method performs well in uni-modal scenarios, but lacks the multi-modality.
  • ...and 1 more figures