Table of Contents
Fetching ...

AdaFlow: Imitation Learning with Variance-Adaptive Flow-Based Policies

Xixi Hu, Bo Liu, Xingchao Liu, Qiang Liu

TL;DR

A variance-adaptive ODE solver that can adjust its step size in the inference stage is proposed, making AdaFlow an adaptive decision-maker, offering rapid inference without sacrificing diversity.

Abstract

Diffusion-based imitation learning improves Behavioral Cloning (BC) on multi-modal decision-making, but comes at the cost of significantly slower inference due to the recursion in the diffusion process. It urges us to design efficient policy generators while keeping the ability to generate diverse actions. To address this challenge, we propose AdaFlow, an imitation learning framework based on flow-based generative modeling. AdaFlow represents the policy with state-conditioned ordinary differential equations (ODEs), which are known as probability flows. We reveal an intriguing connection between the conditional variance of their training loss and the discretization error of the ODEs. With this insight, we propose a variance-adaptive ODE solver that can adjust its step size in the inference stage, making AdaFlow an adaptive decision-maker, offering rapid inference without sacrificing diversity. Interestingly, it automatically reduces to a one-step generator when the action distribution is uni-modal. Our comprehensive empirical evaluation shows that AdaFlow achieves high performance with fast inference speed.

AdaFlow: Imitation Learning with Variance-Adaptive Flow-Based Policies

TL;DR

A variance-adaptive ODE solver that can adjust its step size in the inference stage is proposed, making AdaFlow an adaptive decision-maker, offering rapid inference without sacrificing diversity.

Abstract

Diffusion-based imitation learning improves Behavioral Cloning (BC) on multi-modal decision-making, but comes at the cost of significantly slower inference due to the recursion in the diffusion process. It urges us to design efficient policy generators while keeping the ability to generate diverse actions. To address this challenge, we propose AdaFlow, an imitation learning framework based on flow-based generative modeling. AdaFlow represents the policy with state-conditioned ordinary differential equations (ODEs), which are known as probability flows. We reveal an intriguing connection between the conditional variance of their training loss and the discretization error of the ODEs. With this insight, we propose a variance-adaptive ODE solver that can adjust its step size in the inference stage, making AdaFlow an adaptive decision-maker, offering rapid inference without sacrificing diversity. Interestingly, it automatically reduces to a one-step generator when the action distribution is uni-modal. Our comprehensive empirical evaluation shows that AdaFlow achieves high performance with fast inference speed.
Paper Structure (31 sections, 6 theorems, 27 equations, 9 figures, 8 tables, 1 algorithm)

This paper contains 31 sections, 6 theorems, 27 equations, 9 figures, 8 tables, 1 algorithm.

Key Result

Proposition 3.1

Let $v^*$ be the optimum of Eq. (eq:policy_loss) . If $\mathrm{var}_{\pi_E}({\boldsymbol{a}} \mid {\boldsymbol{s}}) = 0$ where ${\boldsymbol{a}} \sim \pi_E(\cdot \mid s)$, then the learned ODE conditioned on ${\boldsymbol{s}}$ is whose trajectories are straight lines pointing to ${\boldsymbol{z}}_1$ and hence can be calculated exactly with one step of Euler step:

Figures (9)

  • Figure 1: AdaFlow is a fast imitation learning policy. It adaptively adjust the number of simulation steps when generating actions. For low-variance states, it functions as a one-step action generator. For high-variance states, it employs more steps to ensure accurate action generation. This adaptive approach enables AdaFlow to achieve an average generation speed close to one step per task completion.
  • Figure 2: Illustrating the computation adaptivity of AdaFlow (orange) on simple regression task. In the upper portion of the image, we use Diffusion Policy (DDIM) and AdaFlow to predict $y$ given $x$, with deterministic $y=0$ when $x\leq 0$, and bimodal $y =\pm x$ when $x>0$. Both DDIM and AdaFlow fit the demonstration data well. However, the simulated ODE trajectory learned by Diffusion-Policy with DDIM (red) is not straight no matter what $x$ is. By contrast, the simulated ODE trajectory learned by AdaFlow with fixed step (blue) is a straight line when the prediction is deterministic ($x \leq 0$), which means the generation can be exactly done by one-step Euler discretization. At the bottom, we show that AdaFlow can adaptively adjust the number of simulation steps based on the $x$ value according to the estimated variance at $x$.
  • Figure 3: Generated trajectories. We visualize the trajectories generated by different policies, with the agent's starting point fixed.
  • Figure 4: LIBERO tasks. We visualize the demonstrated trajectories of the robot's end effector.
  • Figure 5: Predicted variance. We visualize the variance predicted by AdaFlow. The variance is computed on states from the expert's demonstration and averaged over all simulation steps (e.g., $t$ from $0$ to $1$). Then we normalize the variance to $[0, 1]$ by the largest variance found at all states.
  • ...and 4 more figures

Theorems & Definitions (11)

  • Proposition 3.1
  • Proposition 3.2
  • Proposition 3.3
  • Proposition 3.5
  • PROOF 1
  • PROOF 2
  • PROOF 3
  • Lemma A.1
  • PROOF 4
  • Lemma A.2
  • ...and 1 more