Table of Contents
Fetching ...

FlowPolicy: Enabling Fast and Robust 3D Flow-based Policy via Consistency Flow Matching for Robot Manipulation

Qinglun Zhang, Zhen Liu, Haoqiang Fan, Guanghui Liu, Bing Zeng, Shuaicheng Liu

TL;DR

FlowPolicy tackles the efficiency bottleneck of diffusion-based imitation learning for 3D robot manipulation by casting policy generation as a conditional consistency flow matching problem. It conditions on 3D point-cloud observations and learns velocity-consistent straight-line flows to enable one-step action decoding in real time. Across 37 tasks on Adroit and Metaworld, FlowPolicy achieves substantial runtime reductions while maintaining competitive success rates, highlighting the practical potential of conditional flow-based policies for real-time robotics. This work broadens the applicability of 3D-vision-based imitation learning to real-world, real-time manipulation scenarios.

Abstract

Robots can acquire complex manipulation skills by learning policies from expert demonstrations, which is often known as vision-based imitation learning. Generating policies based on diffusion and flow matching models has been shown to be effective, particularly in robotic manipulation tasks. However, recursion-based approaches are inference inefficient in working from noise distributions to policy distributions, posing a challenging trade-off between efficiency and quality. This motivates us to propose FlowPolicy, a novel framework for fast policy generation based on consistency flow matching and 3D vision. Our approach refines the flow dynamics by normalizing the self-consistency of the velocity field, enabling the model to derive task execution policies in a single inference step. Specifically, FlowPolicy conditions on the observed 3D point cloud, where consistency flow matching directly defines straight-line flows from different time states to the same action space, while simultaneously constraining their velocity values, that is, we approximate the trajectories from noise to robot actions by normalizing the self-consistency of the velocity field within the action space, thus improving the inference efficiency. We validate the effectiveness of FlowPolicy in Adroit and Metaworld, demonstrating a 7$\times$ increase in inference speed while maintaining competitive average success rates compared to state-of-the-art methods. Code is available at https://github.com/zql-kk/FlowPolicy.

FlowPolicy: Enabling Fast and Robust 3D Flow-based Policy via Consistency Flow Matching for Robot Manipulation

TL;DR

FlowPolicy tackles the efficiency bottleneck of diffusion-based imitation learning for 3D robot manipulation by casting policy generation as a conditional consistency flow matching problem. It conditions on 3D point-cloud observations and learns velocity-consistent straight-line flows to enable one-step action decoding in real time. Across 37 tasks on Adroit and Metaworld, FlowPolicy achieves substantial runtime reductions while maintaining competitive success rates, highlighting the practical potential of conditional flow-based policies for real-time robotics. This work broadens the applicability of 3D-vision-based imitation learning to real-world, real-time manipulation scenarios.

Abstract

Robots can acquire complex manipulation skills by learning policies from expert demonstrations, which is often known as vision-based imitation learning. Generating policies based on diffusion and flow matching models has been shown to be effective, particularly in robotic manipulation tasks. However, recursion-based approaches are inference inefficient in working from noise distributions to policy distributions, posing a challenging trade-off between efficiency and quality. This motivates us to propose FlowPolicy, a novel framework for fast policy generation based on consistency flow matching and 3D vision. Our approach refines the flow dynamics by normalizing the self-consistency of the velocity field, enabling the model to derive task execution policies in a single inference step. Specifically, FlowPolicy conditions on the observed 3D point cloud, where consistency flow matching directly defines straight-line flows from different time states to the same action space, while simultaneously constraining their velocity values, that is, we approximate the trajectories from noise to robot actions by normalizing the self-consistency of the velocity field within the action space, thus improving the inference efficiency. We validate the effectiveness of FlowPolicy in Adroit and Metaworld, demonstrating a 7 increase in inference speed while maintaining competitive average success rates compared to state-of-the-art methods. Code is available at https://github.com/zql-kk/FlowPolicy.

Paper Structure

This paper contains 23 sections, 8 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Comparison of FlowPolicy with the state-of-the-art 2D-based method DPchi2023diffusion and 3D-based methods DP3 ze20243ddiffusionpolicygeneralizable and its lightweight version Simple DP3 in terms of inference time and average success rate on Adroit and Metaworld.
  • Figure 2: Overall pipeline. The top section visualizes FlowPolicy, where a straight-line flow enables the fastest data transition from the noise distribution to the action distribution (Adroit: Open the door). The bottom section shows the details of FlowPolicy: expert demonstrations are converted to 3D point clouds, which, along with the robot state, are encoded into compact 3D visual representations and state embeddings. A straight-line flow is then learned via conditional consistency flow matching, generating high-quality actions for tasks (Metaworld: Assembly) at real-time inference speed.
  • Figure 3: Qualitative Comparison of FlowPolicy and DP3 ze20243ddiffusionpolicygeneralizable on two challenging manipulation tasks from Adroit and Metaworld. Our method successfully generates high-quality actions at real-time speeds, completing these tasks effectively, whereas DP3 either produces lower-quality actions (left) or fails to complete the task (right).
  • Figure 4: Illustrations of the learning curves. Compared to Simple DP3 and DP3, FlowPolicy demonstrates higher stability, learning efficiency, and success rates.
  • Figure 5: Ablation on the number of expert demonstrations. We choose four typical tasks to explore the impact of different numbers of demonstrations on FlowPolicy and DP3. Both generally improve the accuracy with more demonstrations, but FlowPolicy typically has a higher success rate and avoids the performance bottleneck as presented in DP3.