Learning Robotic Manipulation Policies from Point Clouds with Conditional Flow Matching

Eugenio Chisari; Nick Heppert; Max Argus; Tim Welschehold; Thomas Brox; Abhinav Valada

Learning Robotic Manipulation Policies from Point Clouds with Conditional Flow Matching

Eugenio Chisari, Nick Heppert, Max Argus, Tim Welschehold, Thomas Brox, Abhinav Valada

TL;DR

This paper investigates the application of CFM in the context of robotic policy learning and specifically study the interplay with the other design choices required to build an imitation learning algorithm, showing that CFM gives the best performance when combined with point cloud input observations.

Abstract

Learning from expert demonstrations is a promising approach for training robotic manipulation policies from limited data. However, imitation learning algorithms require a number of design choices ranging from the input modality, training objective, and 6-DoF end-effector pose representation. Diffusion-based methods have gained popularity as they enable predicting long-horizon trajectories and handle multimodal action distributions. Recently, Conditional Flow Matching (CFM) (or Rectified Flow) has been proposed as a more flexible generalization of diffusion models. In this paper, we investigate the application of CFM in the context of robotic policy learning and specifically study the interplay with the other design choices required to build an imitation learning algorithm. We show that CFM gives the best performance when combined with point cloud input observations. Additionally, we study the feasibility of a CFM formulation on the SO(3) manifold and evaluate its suitability with a simplified example. We perform extensive experiments on RLBench which demonstrate that our proposed PointFlowMatch approach achieves a state-of-the-art average success rate of 67.8% over eight tasks, double the performance of the next best method.

Learning Robotic Manipulation Policies from Point Clouds with Conditional Flow Matching

TL;DR

Abstract

Paper Structure (14 sections, 3 equations, 5 figures, 3 tables, 2 algorithms)

This paper contains 14 sections, 3 equations, 5 figures, 3 tables, 2 algorithms.

Introduction
Related Work
Technical Approach
Observation and Action Spaces
Conditional Flow Matching for Policy Learning
Conditional Flow Matching for Data in $SO(3)$
Model Architecture and Training Setup
Experimental Evaluation
Benchmarking Results
Ablation Study
Real Robot Experiments
Conclusion
Additional Experiments
Simplified $SO(2)$-Experiment: No Free Lunch

Figures (5)

Figure 1: Diffusion and CFM are repeatedly applied to a noisy trajectory, thereby iteratively yielding a clean trajectory that can be executed on the robot. The generative models also take as input encoded observations.
Figure 2: Example images of the eight RLBench tasks.
Figure 3: Comparison of CFM and DDIM for varying values of the number of inference steps $k$. We compare the inference time ($\downarrow$) measured in [ms] as well as the inference FPS ($\uparrow$) in [Hz] against overall success rate ($\uparrow$) for both formulations.
Figure 4: We demonstrate PointFlowMatch on a real robotic setup. We evaluate on two tasks: open box and sponge on plate.
Figure 5: Simplified Example. The left figure shows the edge case when random samples are close to the opposite pole of the target sample. Here the $SO(3)$ formulation presents a discontinuity which makes learning more difficult. In the three right figures, we visualize the mean error during inference across different sampling locations for our different formulations. We mark the target with a red cross. One observes that for the Euclidean formulation the error is lower for initial sample points along the axis orthogonal to the target. This is expected as values sampled along the line are naturally mapped to the target when normalized. On the other side in the last figure, one observes higher errors close to the pole. Additionally, a training data bias is visible as the error is higher on one side of the discontinuity.

Learning Robotic Manipulation Policies from Point Clouds with Conditional Flow Matching

TL;DR

Abstract

Learning Robotic Manipulation Policies from Point Clouds with Conditional Flow Matching

Authors

TL;DR

Abstract

Table of Contents

Figures (5)