Table of Contents
Fetching ...

CCDP: Composition of Conditional Diffusion Policies with Guided Sampling

Amirreza Razmjoo, Sylvain Calinon, Michael Gienger, Fan Zhang

TL;DR

CCDP tackles failure recovery in imitation learning for robotics by conditioning diffusion-based policies on failure features and composing multiple diffusion experts via a product-of-experts framework to steer away from ineffective actions. The approach decomposes long-horizon failure management into modular subproblems and offline synthesizes a recovery dataset from successful demonstrations, enabling robust recovery without explicit environment models or labeled failure data. A low-level controller dynamically adjusts its sampling space through weighted contributions from state, history, and failure features, implemented with weights $w_s$, $w_h$, and $w_z^i$. Across door opening, button searching, object manipulation, packing, and bartending tasks, CCDP achieves higher success rates than standard DP and DP* baselines while preserving the implicit objectives encoded in demonstrations, underscoring practical implications for robust, data-efficient robotic control.

Abstract

Imitation Learning offers a promising approach to learn directly from data without requiring explicit models, simulations, or detailed task definitions. During inference, actions are sampled from the learned distribution and executed on the robot. However, sampled actions may fail for various reasons, and simply repeating the sampling step until a successful action is obtained can be inefficient. In this work, we propose an enhanced sampling strategy that refines the sampling distribution to avoid previously unsuccessful actions. We demonstrate that by solely utilizing data from successful demonstrations, our method can infer recovery actions without the need for additional exploratory behavior or a high-level controller. Furthermore, we leverage the concept of diffusion model decomposition to break down the primary problem, which may require long-horizon history to manage failures, into multiple smaller, more manageable sub-problems in learning, data collection, and inference, thereby enabling the system to adapt to variable failure counts. Our approach yields a low-level controller that dynamically adjusts its sampling space to improve efficiency when prior samples fall short. We validate our method across several tasks, including door opening with unknown directions, object manipulation, and button-searching scenarios, demonstrating that our approach outperforms traditional baselines.

CCDP: Composition of Conditional Diffusion Policies with Guided Sampling

TL;DR

CCDP tackles failure recovery in imitation learning for robotics by conditioning diffusion-based policies on failure features and composing multiple diffusion experts via a product-of-experts framework to steer away from ineffective actions. The approach decomposes long-horizon failure management into modular subproblems and offline synthesizes a recovery dataset from successful demonstrations, enabling robust recovery without explicit environment models or labeled failure data. A low-level controller dynamically adjusts its sampling space through weighted contributions from state, history, and failure features, implemented with weights , , and . Across door opening, button searching, object manipulation, packing, and bartending tasks, CCDP achieves higher success rates than standard DP and DP* baselines while preserving the implicit objectives encoded in demonstrations, underscoring practical implications for robust, data-efficient robotic control.

Abstract

Imitation Learning offers a promising approach to learn directly from data without requiring explicit models, simulations, or detailed task definitions. During inference, actions are sampled from the learned distribution and executed on the robot. However, sampled actions may fail for various reasons, and simply repeating the sampling step until a successful action is obtained can be inefficient. In this work, we propose an enhanced sampling strategy that refines the sampling distribution to avoid previously unsuccessful actions. We demonstrate that by solely utilizing data from successful demonstrations, our method can infer recovery actions without the need for additional exploratory behavior or a high-level controller. Furthermore, we leverage the concept of diffusion model decomposition to break down the primary problem, which may require long-horizon history to manage failures, into multiple smaller, more manageable sub-problems in learning, data collection, and inference, thereby enabling the system to adapt to variable failure counts. Our approach yields a low-level controller that dynamically adjusts its sampling space to improve efficiency when prior samples fall short. We validate our method across several tasks, including door opening with unknown directions, object manipulation, and button-searching scenarios, demonstrating that our approach outperforms traditional baselines.

Paper Structure

This paper contains 20 sections, 9 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: A diverse demonstration set, featuring multiple task variations, is provided to the robot. In the event of a failure, the robot switches to alternative variations rather than repeatedly sampling the same actions.
  • Figure 2: An illustrative example showing samples generated from multiple distributions learned from the same demonstration set. The blue car, with its trajectory history shown as a shaded trail, aims to reach one of the white goal positions, while the yellow arrows indicate sampled actions.
  • Figure 3: Schematic overview of the proposed method illustrating the offline and inference phases.
  • Figure 4: Different experimental setups used in this paper
  • Figure 5: Results from 100 random test scenarios comparing (a) success rates across different tasks and (b) implicit objective fulfillment across various experiments.