Don't Start from Scratch: Behavioral Refinement via Interpolant-based Policy Diffusion

Kaiqi Chen; Eugene Lim; Kelvin Lin; Yiyang Chen; Harold Soh

Don't Start from Scratch: Behavioral Refinement via Interpolant-based Policy Diffusion

Kaiqi Chen, Eugene Lim, Kelvin Lin, Yiyang Chen, Harold Soh

TL;DR

This work addresses the inefficiency of diffusion-based imitation learning that starts from Gaussian noise by introducing BRIDGeR, which uses stochastic interpolants to bridge informative source policies to expert-like behavior. The authors provide theoretical guarantees showing that better source policies improve the learned policy up to a controllable additive factor, and they propose a practical BRIDGeR method that learns velocity and score functions within a forward SDE framework. Empirically, BRIDGeR outperforms state-of-the-art diffusion methods across diverse robotic tasks and real-world experiments, especially when diffusion steps are limited and data are scarce, with Power3 interpolants excelling in highly multi-modal scenarios. The results demonstrate the practical impact of leveraging prior policies for faster, more reliable imitation learning and lay groundwork for lifelong, transfer-like improvements in robotics.

Abstract

Imitation learning empowers artificial agents to mimic behavior by learning from demonstrations. Recently, diffusion models, which have the ability to model high-dimensional and multimodal distributions, have shown impressive performance on imitation learning tasks. These models learn to shape a policy by diffusing actions (or states) from standard Gaussian noise. However, the target policy to be learned is often significantly different from Gaussian and this mismatch can result in poor performance when using a small number of diffusion steps (to improve inference speed) and under limited data. The key idea in this work is that initiating from a more informative source than Gaussian enables diffusion methods to mitigate the above limitations. We contribute both theoretical results, a new method, and empirical findings that show the benefits of using an informative source policy. Our method, which we call BRIDGER, leverages the stochastic interpolants framework to bridge arbitrary policies, thus enabling a flexible approach towards imitation learning. It generalizes prior work in that standard Gaussians can still be applied, but other source policies can be used if available. In experiments on challenging simulation benchmarks and on real robots, BRIDGER outperforms state-of-the-art diffusion policies. We provide further analysis on design considerations when applying BRIDGER. Code for BRIDGER is available at https://github.com/clear-nus/bridger.

Don't Start from Scratch: Behavioral Refinement via Interpolant-based Policy Diffusion

TL;DR

Abstract

Paper Structure (23 sections, 6 theorems, 35 equations, 18 figures, 16 tables, 2 algorithms)

This paper contains 23 sections, 6 theorems, 35 equations, 18 figures, 16 tables, 2 algorithms.

Introduction
Preliminaries: Background & Related Work
Problem Formulation
Diffusion-based Policy Learning
Related Work
Bridging Policies: Theoretical Considerations
Method: BRIDGeR for Imitation Learning
Stochastic Interpolants for Imitation Learning
Design Decisions
Experiments
Domains
Compared Methods.
Test Methodology.
Main Results and Discussion
Real World Robot Experiments
...and 8 more sections

Key Result

Theorem 1

Let $\hat{\pi}_0$ and $\hat{\rho}_0$ be two source distributions and given that Assumption ass:contbounded holds. Then the improvement of the generated target distribution is bounded by the improvement of the source distribution

Figures (18)

Figure 1: (A) Overview of action generation with BRIDGeR. With trained velocity $b$ and score $s$ functions, BRIDGeR transports the actions from source distribution $\pi_0(a|x)$ to the target distribution $\pi_1(a|x)$ via the forward SDE (Eq. \ref{['eq:fsde']}). (B) We tested BRIDGeR on challenging robot benchmark tasks and show that using informative source policies enhances performance. For example, in 6-DoF grasp generation, using heuristic or data-driven source policies results in more successful grasps compared to the conventional Gaussian.
Figure 2: Intermediate distributions obtained from BRIDGeR and DDIM trained on 2D synthetic data. With a source distribution that is closer to the target distribution (smaller Earth Mover's Distance (EMD) values), BRIDGeR can better recover the true target distribution.
Figure 3: Earth Mover's Distance (EMD) of the generated target distributions under different source distributions and interpolant functions on our 2D synthetic dataset. Each point represents the EMD between a source/target distribution and the true target distribution.
Figure 4: Intermediate distributions under different interpolant functions (trained on 2D synthetic data).
Figure 5: Intermediate distributions with varying $\gamma(t)=d\sqrt{2t(1-t)}$. When the support of the source distribution is narrower than the target distribution, selecting a small $\gamma$ value ($d=0.03$) results in samples clustering within the high-density areas of the source. Conversely, an excessively large $\gamma$ ($d=3$) results in overdispersion. However, a well-choosen $\gamma$ ($d=0.3$) facilitates coverage to ensure reasonable recovery of the target.
...and 13 more figures

Theorems & Definitions (10)

Theorem 1
Theorem 2
Theorem 3
Definition 1: Stochastic Interpolant
Theorem 3
proof
Theorem 3
proof
Theorem 3
proof

Don't Start from Scratch: Behavioral Refinement via Interpolant-based Policy Diffusion

TL;DR

Abstract

Don't Start from Scratch: Behavioral Refinement via Interpolant-based Policy Diffusion

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (18)

Theorems & Definitions (10)