Table of Contents
Fetching ...

ActionFlow: Equivariant, Accurate, and Efficient Policies with Spatially Symmetric Flow Matching

Niklas Funk, Julen Urain, Joao Carvalho, Vignesh Prasad, Georgia Chalvatzaki, Jan Peters

TL;DR

A novel policy class, ActionFlow, which integrates spatial symmetry inductive biases while generating expressive action sequences and introduces an SE(3) Invariant Transformer architecture, which enables informed spatial reasoning based on the relative SE(3) poses between observations and actions.

Abstract

Spatial understanding is a critical aspect of most robotic tasks, particularly when generalization is important. Despite the impressive results of deep generative models in complex manipulation tasks, the absence of a representation that encodes intricate spatial relationships between observations and actions often limits spatial generalization, necessitating large amounts of demonstrations. To tackle this problem, we introduce a novel policy class, ActionFlow. ActionFlow integrates spatial symmetry inductive biases while generating expressive action sequences. On the representation level, ActionFlow introduces an SE(3) Invariant Transformer architecture, which enables informed spatial reasoning based on the relative SE(3) poses between observations and actions. For action generation, ActionFlow leverages Flow Matching, a state-of-the-art deep generative model known for generating high-quality samples with fast inference - an essential property for feedback control. In combination, ActionFlow policies exhibit strong spatial and locality biases and SE(3)-equivariant action generation. Our experiments demonstrate the effectiveness of ActionFlow and its two main components on several simulated and real-world robotic manipulation tasks and confirm that we can obtain equivariant, accurate, and efficient policies with spatially symmetric flow matching. Project website: https://flowbasedpolicies.github.io/

ActionFlow: Equivariant, Accurate, and Efficient Policies with Spatially Symmetric Flow Matching

TL;DR

A novel policy class, ActionFlow, which integrates spatial symmetry inductive biases while generating expressive action sequences and introduces an SE(3) Invariant Transformer architecture, which enables informed spatial reasoning based on the relative SE(3) poses between observations and actions.

Abstract

Spatial understanding is a critical aspect of most robotic tasks, particularly when generalization is important. Despite the impressive results of deep generative models in complex manipulation tasks, the absence of a representation that encodes intricate spatial relationships between observations and actions often limits spatial generalization, necessitating large amounts of demonstrations. To tackle this problem, we introduce a novel policy class, ActionFlow. ActionFlow integrates spatial symmetry inductive biases while generating expressive action sequences. On the representation level, ActionFlow introduces an SE(3) Invariant Transformer architecture, which enables informed spatial reasoning based on the relative SE(3) poses between observations and actions. For action generation, ActionFlow leverages Flow Matching, a state-of-the-art deep generative model known for generating high-quality samples with fast inference - an essential property for feedback control. In combination, ActionFlow policies exhibit strong spatial and locality biases and SE(3)-equivariant action generation. Our experiments demonstrate the effectiveness of ActionFlow and its two main components on several simulated and real-world robotic manipulation tasks and confirm that we can obtain equivariant, accurate, and efficient policies with spatially symmetric flow matching. Project website: https://flowbasedpolicies.github.io/
Paper Structure (25 sections, 12 equations, 12 figures, 2 tables, 3 algorithms)

This paper contains 25 sections, 12 equations, 12 figures, 2 tables, 3 algorithms.

Figures (12)

  • Figure 1: Spatial Symmetries in ActionFlow. (a), Visual representation of the SE(3) Invariant Transformer. Given a set of observations $\bm{{\bm{F}}_o}$ with associated poses ${\bm{T}}_o$ & candidate actions ${\bm{T}}_a$, the Transformer predicts vectors ${\bm{v}}$ to update the actions (\ref{['eq:sample_se3']}). The action refinement process is repeated $K$ times. (b) augments the classical attention with points ${\bm{p}}_Q$ and ${\bm{p}}_K$ generated in the local frames of the query and key poses. The layer is designed to generate the same output under global SE(3) transformations $\Delta_T \in SE(3)$. (c) ActionFlow is SE(3) equivariant. If we apply a transformation over the observation poses, the generated actions will be equally transformed (cf. \ref{['app:equiv_gen']}).
  • Figure 2: Success rate evaluation on Robomimic tasks with state-based observations averaged over 3 seeds and 50 environments initializations. We evaluate flow-matching and diffusion policy chi2023diffusion with different inference steps. For diffusion policy, we use DDPM for $100$ inference steps and DDIM otherwise.
  • Figure 3: Success rate of models trained on different number of demonstrations$(20, 50, 200, 1000)$ on a subset of Mimicgen tasks mandlekar2023mimicgen. We report the best success rate obtained by the model configurations on 50 test environments. The top row shows two randomly sampled initial configurations for each task.
  • Figure 4: Teleoperation Interface used to collect the data. This image series depicts the data collection process for the lightbulb mounting experiment.
  • Figure 5: Real robot light bulb experiments. Top: We report the performance of our model and two baselines, i.e., replaying 10 randomly selected demonstrations and replaying the demonstrations with a position offset in one randomly selected direction of 1.5. Bottom: Successful rollouts illustrating that ActionFlow policies can generate highly accurate trajectories.
  • ...and 7 more figures