Efficient Hybrid SE(3)-Equivariant Visuomotor Flow Policy via Spherical Harmonics for Robot Manipulation

Qinglun Zhang; Shen Cheng; Tian Dan; Haoqiang Fan; Guanghui Liu; Shuaicheng Liu

Efficient Hybrid SE(3)-Equivariant Visuomotor Flow Policy via Spherical Harmonics for Robot Manipulation

Qinglun Zhang, Shen Cheng, Tian Dan, Haoqiang Fan, Guanghui Liu, Shuaicheng Liu

Abstract

While existing equivariant methods enhance data efficiency, they suffer from high computational intensity, reliance on single-modality inputs, and instability when combined with fast-sampling methods. In this work, we propose E3Flow, a novel framework that addresses the critical limitations of equivariant diffusion policies. E3Flow overcomes these challenges, successfully unifying efficient rectified flow with stable, multi-modal equivariant learning for the first time. Our framework is built upon spherical harmonic representations to ensure rigorous SO(3) equivariance. We introduce a novel invariant Feature Enhancement Module (FEM) that dynamically fuses hybrid visual modalities (point clouds and images), injecting rich visual cues into the spherical harmonic features. We evaluate E3Flow on 8 manipulation tasks from the MimicGen and further conduct 4 real-world experiments to validate its effectiveness in physical environments. Simulation results show that E3Flow achieves a 3.12% improvement in average success rate over the state-of-the-art Spherical Diffusion Policy (SDP) while simultaneously delivering a 7x inference speedup. E3Flow thus demonstrates a new and highly effective trade-off between performance, efficiency, and data efficiency for robotic policy learning. Code: https://github.com/zql-kk/E3Flow.

Efficient Hybrid SE(3)-Equivariant Visuomotor Flow Policy via Spherical Harmonics for Robot Manipulation

Abstract

Paper Structure (38 sections, 15 equations, 7 figures, 8 tables)

This paper contains 38 sections, 15 equations, 7 figures, 8 tables.

Introduction
Related Work
Imitation Learning with Diffusion Models
Equivariant Learning in Robotics
Equivariant Diffusion Policy Learning
Method
Preliminaries
Equivariance
Spherical Harmonics
Overview
Spherical Harmonic Visual Representation
Equivariant Action via Rectified Flow
Experiments
Dataset and Implementation Details
Simulation Benchmarks
...and 23 more sections

Figures (7)

Figure 1: (a) Diagram of the E3Flow equivariant policy. E3Flow can learn equivariant trajectories under unseen scene transformations, whereas DP fails due to the lack of symmetry priors. (b) Comparison of average success rates and inference efficiency between state-of-the-art equivariant and non-equivariant policies on MimicGen tasks.
Figure 2: Overall pipeline. E3Flow encodes multimodal inputs through equivariant and non-equivariant visual encoders, aligns invariant visual features across modalities, and constructs a spherical harmonic–equivariant representation to efficiently guide flow matching for generating high-quality equivariant actions.
Figure 3: Illustration of the Feature Enhancement Module (FEM). FEM injects semantic information from images into the equivariant representation of point clouds, achieving efficient fusion of semantic and geometric features.
Figure 4: Visualization of the execution process of E3Flow across eight tasks from MimicGen. Each column depicts the task progression, where the top corresponds to the task initialization and the bottom indicates the final completion state.
Figure 5: Ablation on the number of expert demonstrations. As the number of demonstrations increases, all methods show improved performance, while E3Flow consistently achieves higher success rates, highlighting its superior data efficiency.
...and 2 more figures

Efficient Hybrid SE(3)-Equivariant Visuomotor Flow Policy via Spherical Harmonics for Robot Manipulation

Abstract

Efficient Hybrid SE(3)-Equivariant Visuomotor Flow Policy via Spherical Harmonics for Robot Manipulation

Authors

Abstract

Table of Contents

Figures (7)