Table of Contents
Fetching ...

Enabling Stateful Behaviors for Diffusion-based Policy Learning

Xiao Liu, Fabian Weigend, Yifan Zhou, Heni Ben Amor

TL;DR

This work addresses action inconsistency in diffusion-based policy learning for robotics by introducing Diff-Control, a stateful diffusion policy that incorporates temporal transitions via a ControlNet-based transition model within a recursive Bayesian framework. The approach defines bel(a_t) through a simple Bayesian update that combines an observation model with a transition model, enabling the policy to condition on past actions and observations. The Diff-Control architecture comprises a frozen base diffusion policy and a trainable Transition Module that uses past action sequences to condition subsequent action generation, trained end-to-end with standard DDPM objectives and a ControlNet finetuning loss. Real-robot experiments on Duck Scooping and Drum Beats demonstrate that Diff-Control achieves higher success rates (e.g., 84% and 72%) and greater robustness to perturbations compared with non-stateful diffusion policies and other baselines, validating the benefits of action statefulness for dynamic, multi-step tasks.

Abstract

While imitation learning provides a simple and effective framework for policy learning, acquiring consistent actions during robot execution remains a challenging task. Existing approaches primarily focus on either modifying the action representation at data curation stage or altering the model itself, both of which do not fully address the scalability of consistent action generation. To overcome this limitation, we introduce the Diff-Control policy, which utilizes a diffusion-based model to learn the action representation from a state-space modeling viewpoint. We demonstrate that we can reduce diffusion-based policies' uncertainty by making it stateful through a Bayesian formulation facilitated by ControlNet, leading to improved robustness and success rates. Our experimental results demonstrate the significance of incorporating action statefulness in policy learning, where Diff-Control shows improved performance across various tasks. Specifically, Diff-Control achieves an average success rate of 72% and 84% on stateful and dynamic tasks, respectively. Project page: https://github.com/ir-lab/Diff-Control

Enabling Stateful Behaviors for Diffusion-based Policy Learning

TL;DR

This work addresses action inconsistency in diffusion-based policy learning for robotics by introducing Diff-Control, a stateful diffusion policy that incorporates temporal transitions via a ControlNet-based transition model within a recursive Bayesian framework. The approach defines bel(a_t) through a simple Bayesian update that combines an observation model with a transition model, enabling the policy to condition on past actions and observations. The Diff-Control architecture comprises a frozen base diffusion policy and a trainable Transition Module that uses past action sequences to condition subsequent action generation, trained end-to-end with standard DDPM objectives and a ControlNet finetuning loss. Real-robot experiments on Duck Scooping and Drum Beats demonstrate that Diff-Control achieves higher success rates (e.g., 84% and 72%) and greater robustness to perturbations compared with non-stateful diffusion policies and other baselines, validating the benefits of action statefulness for dynamic, multi-step tasks.

Abstract

While imitation learning provides a simple and effective framework for policy learning, acquiring consistent actions during robot execution remains a challenging task. Existing approaches primarily focus on either modifying the action representation at data curation stage or altering the model itself, both of which do not fully address the scalability of consistent action generation. To overcome this limitation, we introduce the Diff-Control policy, which utilizes a diffusion-based model to learn the action representation from a state-space modeling viewpoint. We demonstrate that we can reduce diffusion-based policies' uncertainty by making it stateful through a Bayesian formulation facilitated by ControlNet, leading to improved robustness and success rates. Our experimental results demonstrate the significance of incorporating action statefulness in policy learning, where Diff-Control shows improved performance across various tasks. Specifically, Diff-Control achieves an average success rate of 72% and 84% on stateful and dynamic tasks, respectively. Project page: https://github.com/ir-lab/Diff-Control
Paper Structure (10 sections, 4 equations, 8 figures, 2 tables)

This paper contains 10 sections, 4 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Diff-Control Policy incorporates ControlNet, functioning as a transition model that captures temporal transitions within the action space to ensure action consistency.
  • Figure 2: Stateful behavior: at a given state, Diff-Control policy can utilize prior trajectories to approximate the desired function. Diffusion policy chi2023diffusion learns both modes but fails on generating the correct trajectory cosistently, Image-BC/BC-Z jang2022bc fails to generate the correct trajectory.
  • Figure 3: The Diff-Control Policy is implemented through the utilization of a locked U-net diffusion policy architecture. It replicates the encoder and middle blocks and incorporates zero convolution layers.
  • Figure 4: Real-world tasks in this study: a) "Duck Scooping" task in a water tank, b) "Drum Beats" task by hitting the drum 3 times.
  • Figure 5: Diff-Control for real-world tasks: The first row shows a successful duck scooping experiment. The second row displays one drum task result. The results are best appreciated with videos on the website: https://diff-control.github.io/.
  • ...and 3 more figures