Table of Contents
Fetching ...

CAPE: Context-Aware Diffusion Policy Via Proximal Mode Expansion for Collision Avoidance

Rui Heng Yang, Xuan Zhao, Leo Maxime Brunswic, Montgomery Alban, Mateo Clemente, Tongtong Cao, Jun Jin, Amir Rasouli

TL;DR

CAP E introduces Context-Aware diffusion policy via Proximal mode Expansion (CAPE) to overcome diffusion-based robotics' limited trajectory multimodality. By fusing a context-aware prior, derived from previous trajectory segments, with training-free, context-guided denoising, CAPE iteratively expands trajectory modes to generate collision-free, goal-consistent plans in unseen environments. The method maintains task intent while enlarging the distributional support, demonstrated by significant performance gains over state-of-the-art methods in both simulated and real-world cluttered tasks. These results suggest CAPE enables robust generalization for collision avoidance without extensive obstacle-covered datasets or heavy online optimization.

Abstract

In robotics, diffusion models can capture multi-modal trajectories from demonstrations, making them a transformative approach in imitation learning. However, achieving optimal performance following this regiment requires a large-scale dataset, which is costly to obtain, especially for challenging tasks, such as collision avoidance. In those tasks, generalization at test time demands coverage of many obstacles types and their spatial configurations, which are impractical to acquire purely via data. To remedy this problem, we propose Context-Aware diffusion policy via Proximal mode Expansion (CAPE), a framework that expands trajectory distribution modes with context-aware prior and guidance at inference via a novel prior-seeded iterative guided refinement procedure. The framework generates an initial trajectory plan and executes a short prefix trajectory, and then the remaining trajectory segment is perturbed to an intermediate noise level, forming a trajectory prior. Such a prior is context-aware and preserves task intent. Repeating the process with context-aware guided denoising iteratively expands mode support to allow finding smoother, less collision-prone trajectories. For collision avoidance, CAPE expands trajectory distribution modes with collision-aware context, enabling the sampling of collision-free trajectories in previously unseen environments while maintaining goal consistency. We evaluate CAPE on diverse manipulation tasks in cluttered unseen simulated and real-world settings and show up to 26% and 80% higher success rates respectively compared to SOTA methods, demonstrating better generalization to unseen environments.

CAPE: Context-Aware Diffusion Policy Via Proximal Mode Expansion for Collision Avoidance

TL;DR

CAP E introduces Context-Aware diffusion policy via Proximal mode Expansion (CAPE) to overcome diffusion-based robotics' limited trajectory multimodality. By fusing a context-aware prior, derived from previous trajectory segments, with training-free, context-guided denoising, CAPE iteratively expands trajectory modes to generate collision-free, goal-consistent plans in unseen environments. The method maintains task intent while enlarging the distributional support, demonstrated by significant performance gains over state-of-the-art methods in both simulated and real-world cluttered tasks. These results suggest CAPE enables robust generalization for collision avoidance without extensive obstacle-covered datasets or heavy online optimization.

Abstract

In robotics, diffusion models can capture multi-modal trajectories from demonstrations, making them a transformative approach in imitation learning. However, achieving optimal performance following this regiment requires a large-scale dataset, which is costly to obtain, especially for challenging tasks, such as collision avoidance. In those tasks, generalization at test time demands coverage of many obstacles types and their spatial configurations, which are impractical to acquire purely via data. To remedy this problem, we propose Context-Aware diffusion policy via Proximal mode Expansion (CAPE), a framework that expands trajectory distribution modes with context-aware prior and guidance at inference via a novel prior-seeded iterative guided refinement procedure. The framework generates an initial trajectory plan and executes a short prefix trajectory, and then the remaining trajectory segment is perturbed to an intermediate noise level, forming a trajectory prior. Such a prior is context-aware and preserves task intent. Repeating the process with context-aware guided denoising iteratively expands mode support to allow finding smoother, less collision-prone trajectories. For collision avoidance, CAPE expands trajectory distribution modes with collision-aware context, enabling the sampling of collision-free trajectories in previously unseen environments while maintaining goal consistency. We evaluate CAPE on diverse manipulation tasks in cluttered unseen simulated and real-world settings and show up to 26% and 80% higher success rates respectively compared to SOTA methods, demonstrating better generalization to unseen environments.

Paper Structure

This paper contains 14 sections, 4 equations, 7 figures, 5 tables, 2 algorithms.

Figures (7)

  • Figure 1: Overview of the proposed method. Priors derived from previous iterations are incorporated to expand the support of trajectory modes, thereby facilitating the generation of trajectories that are more context-aware.
  • Figure 2: An overview of the proposed framework: A diffusion model is trained to learn pick-and-place using data from an empty scene. The model uses this skill at inference time in cluttered environments. During inference, the task description and robot observations are sent to the model, and 3D point clouds are used to generate collision-aware guidance signals. Initial planning: The noisy trajectory is sampled from Gaussian distribution. Prior-Seeded Guided Iterative Refinement: After executing a short prefix trajectory, the remaining trajectory is perturbed with an intermediate noise level $\delta$, forming a prior. The prior preserves task intent and previously expanded mode support, which is further iteratively expanded with collision-aware guidance, until task completion.
  • Figure 3: Trajectory samples under different guidance level in a real planning task without any prior.
  • Figure 4: Simulated environments with increasing level of difficulty used in the experiments. From left to right: 1-conceptual, 2-environment with 25 small obstacles, 3-environment with 15 small and 2 medium size obstacles, and 4- environment with 25 small and 2 large obstacles.
  • Figure 5: 3D visualization of trajectory updates during execution in Env4 under full observation. Without a prior, the trajectory is trapped in clutter regions. With a trajectory prior, repeated guided refinement augments context-aware distributional mode support and increases diversity, so the trajectory progressively shifts out of clutter toward the goal.
  • ...and 2 more figures