Enabling Stateful Behaviors for Diffusion-based Policy Learning
Xiao Liu, Fabian Weigend, Yifan Zhou, Heni Ben Amor
TL;DR
This work addresses action inconsistency in diffusion-based policy learning for robotics by introducing Diff-Control, a stateful diffusion policy that incorporates temporal transitions via a ControlNet-based transition model within a recursive Bayesian framework. The approach defines bel(a_t) through a simple Bayesian update that combines an observation model with a transition model, enabling the policy to condition on past actions and observations. The Diff-Control architecture comprises a frozen base diffusion policy and a trainable Transition Module that uses past action sequences to condition subsequent action generation, trained end-to-end with standard DDPM objectives and a ControlNet finetuning loss. Real-robot experiments on Duck Scooping and Drum Beats demonstrate that Diff-Control achieves higher success rates (e.g., 84% and 72%) and greater robustness to perturbations compared with non-stateful diffusion policies and other baselines, validating the benefits of action statefulness for dynamic, multi-step tasks.
Abstract
While imitation learning provides a simple and effective framework for policy learning, acquiring consistent actions during robot execution remains a challenging task. Existing approaches primarily focus on either modifying the action representation at data curation stage or altering the model itself, both of which do not fully address the scalability of consistent action generation. To overcome this limitation, we introduce the Diff-Control policy, which utilizes a diffusion-based model to learn the action representation from a state-space modeling viewpoint. We demonstrate that we can reduce diffusion-based policies' uncertainty by making it stateful through a Bayesian formulation facilitated by ControlNet, leading to improved robustness and success rates. Our experimental results demonstrate the significance of incorporating action statefulness in policy learning, where Diff-Control shows improved performance across various tasks. Specifically, Diff-Control achieves an average success rate of 72% and 84% on stateful and dynamic tasks, respectively. Project page: https://github.com/ir-lab/Diff-Control
