Don't Start from Scratch: Behavioral Refinement via Interpolant-based Policy Diffusion
Kaiqi Chen, Eugene Lim, Kelvin Lin, Yiyang Chen, Harold Soh
TL;DR
This work addresses the inefficiency of diffusion-based imitation learning that starts from Gaussian noise by introducing BRIDGeR, which uses stochastic interpolants to bridge informative source policies to expert-like behavior. The authors provide theoretical guarantees showing that better source policies improve the learned policy up to a controllable additive factor, and they propose a practical BRIDGeR method that learns velocity and score functions within a forward SDE framework. Empirically, BRIDGeR outperforms state-of-the-art diffusion methods across diverse robotic tasks and real-world experiments, especially when diffusion steps are limited and data are scarce, with Power3 interpolants excelling in highly multi-modal scenarios. The results demonstrate the practical impact of leveraging prior policies for faster, more reliable imitation learning and lay groundwork for lifelong, transfer-like improvements in robotics.
Abstract
Imitation learning empowers artificial agents to mimic behavior by learning from demonstrations. Recently, diffusion models, which have the ability to model high-dimensional and multimodal distributions, have shown impressive performance on imitation learning tasks. These models learn to shape a policy by diffusing actions (or states) from standard Gaussian noise. However, the target policy to be learned is often significantly different from Gaussian and this mismatch can result in poor performance when using a small number of diffusion steps (to improve inference speed) and under limited data. The key idea in this work is that initiating from a more informative source than Gaussian enables diffusion methods to mitigate the above limitations. We contribute both theoretical results, a new method, and empirical findings that show the benefits of using an informative source policy. Our method, which we call BRIDGER, leverages the stochastic interpolants framework to bridge arbitrary policies, thus enabling a flexible approach towards imitation learning. It generalizes prior work in that standard Gaussians can still be applied, but other source policies can be used if available. In experiments on challenging simulation benchmarks and on real robots, BRIDGER outperforms state-of-the-art diffusion policies. We provide further analysis on design considerations when applying BRIDGER. Code for BRIDGER is available at https://github.com/clear-nus/bridger.
