Table of Contents
Fetching ...

EqvAfford: SE(3) Equivariance for Point-Level Affordance Learning

Yue Chen, Chenrui Tie, Ruihai Wu, Hao Dong

TL;DR

EqvAfford addresses robust robotic manipulation across diverse 6D object poses by engineering SE$($3$)$-equivariant point-level affordance learning. It separates invariant per-point affordances from equivariant manipulation proposals via an SE$($3$)$-aware VN-DGCNN encoder and four modules, enabling consistent interaction points and pose-aligned actions under pose changes. Empirically, it outperforms a strong baseline on four tasks in the SAPIEN/PartNet-Mobility setup and demonstrates strong generalization to novel shapes and categories, backed by both invariant feature analysis and downstream manipulation success. The work offers theoretical guarantees of equivariance and practical impact for robust, pose-robust manipulation in real-world robotics.

Abstract

Humans perceive and interact with the world with the awareness of equivariance, facilitating us in manipulating different objects in diverse poses. For robotic manipulation, such equivariance also exists in many scenarios. For example, no matter what the pose of a drawer is (translation, rotation and tilt), the manipulation strategy is consistent (grasp the handle and pull in a line). While traditional models usually do not have the awareness of equivariance for robotic manipulation, which might result in more data for training and poor performance in novel object poses, we propose our EqvAfford framework, with novel designs to guarantee the equivariance in point-level affordance learning for downstream robotic manipulation, with great performance and generalization ability on representative tasks on objects in diverse poses.

EqvAfford: SE(3) Equivariance for Point-Level Affordance Learning

TL;DR

EqvAfford addresses robust robotic manipulation across diverse 6D object poses by engineering SE3-equivariant point-level affordance learning. It separates invariant per-point affordances from equivariant manipulation proposals via an SE3-aware VN-DGCNN encoder and four modules, enabling consistent interaction points and pose-aligned actions under pose changes. Empirically, it outperforms a strong baseline on four tasks in the SAPIEN/PartNet-Mobility setup and demonstrates strong generalization to novel shapes and categories, backed by both invariant feature analysis and downstream manipulation success. The work offers theoretical guarantees of equivariance and practical impact for robust, pose-robust manipulation in real-world robotics.

Abstract

Humans perceive and interact with the world with the awareness of equivariance, facilitating us in manipulating different objects in diverse poses. For robotic manipulation, such equivariance also exists in many scenarios. For example, no matter what the pose of a drawer is (translation, rotation and tilt), the manipulation strategy is consistent (grasp the handle and pull in a line). While traditional models usually do not have the awareness of equivariance for robotic manipulation, which might result in more data for training and poor performance in novel object poses, we propose our EqvAfford framework, with novel designs to guarantee the equivariance in point-level affordance learning for downstream robotic manipulation, with great performance and generalization ability on representative tasks on objects in diverse poses.
Paper Structure (11 sections, 5 equations, 3 figures, 2 tables)

This paper contains 11 sections, 5 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: For each point on a 3D object, we predict an actionability (affordance) score and an interaction orientation to manipulate the object. By leveraging SE(3) equivariance, our method shows consistency and generalizes to different poses of the object.
  • Figure 2: Overview of our proposed framework. Taking as input a point cloud of the object, our framework first outputs a per-point SE(3) invariant feature $f_p^i$ and SE(3) equivariant feature $f_p^e$. The invariant $f_p^i$ results in the affordance map invariant to object rotations, while the equivariant feature $f_p^e$ results in the manipulation actions equivariant to object rotations.
  • Figure 3: We visualize the action scoring and action proposal predictions over the movable parts. We show the qualitative results of our method and baseline on two experiment settings across four objects. Our method is robust to rotation of the input point cloud.