EquiContact: A Hierarchical SE(3) Vision-to-Force Equivariant Policy for Spatially Generalizable Contact-rich Tasks

Joohwan Seo; Arvind Kruthiventy; Soomi Lee; Megan Teng; Xiang Zhang; Seoyeon Choi; Jongeun Choi; Roberto Horowitz

EquiContact: A Hierarchical SE(3) Vision-to-Force Equivariant Policy for Spatially Generalizable Contact-rich Tasks

Joohwan Seo, Arvind Kruthiventy, Soomi Lee, Megan Teng, Xiang Zhang, Seoyeon Choi, Jongeun Choi, Roberto Horowitz

TL;DR

EquiContact presents a hierarchical SE(3) vision-to-force policy for spatially generalizable contact-rich manipulation, combining a high-level Diffusion Equivariant Descriptor Field (Diff-EDF) with a low-level Geometric Compliant ACT (G-CompACT) and a geometric admittance controller (GAC). The framework relies on three design principles—compliance, localized policies, and induced equivariance—to achieve SE(3) equivariance from perception to force control, validated on peg-in-hole, screwing, and surface wiping with strong generalization to unseen spatial configurations. Key contributions include the provable SE(3) equivariance of the EquiContact pipeline, the left-invariant design of G-CompACT, and experimental evidence that the approach outperforms baselines in both in-distribution and out-of-distribution scenarios, while maintaining low interaction forces. The work offers a structured, interpretable blueprint for building spatially generalizable manipulation policies, complementing data-driven end-to-end methods and providing a practical path toward real-world contact-rich robot assistance.

Abstract

This paper presents a framework for learning vision-based robotic policies for contact-rich manipulation tasks that generalize spatially across task configurations. We focus on achieving robust spatial generalization of the policy for the peg-in-hole (PiH) task trained from a small number of demonstrations. We propose EquiContact, a hierarchical policy composed of a high-level vision planner (Diffusion Equivariant Descriptor Field, Diff-EDF) and a novel low-level compliant visuomotor policy (Geometric Compliant ACT, G-CompACT). G-CompACT operates using only localized observations (geometrically consistent error vectors (GCEV), force-torque readings, and wrist-mounted RGB images) and produces actions defined in the end-effector frame. Through these design choices, we show that the entire EquiContact pipeline is SE(3)-equivariant, from perception to force control. We also outline three key components for spatially generalizable contact-rich policies: compliance, localized policies, and induced equivariance. Real-world experiments on PiH, screwing, and surface wiping tasks demonstrate a near-perfect success rate and robust generalization to unseen spatial configurations, validating the proposed framework and principles. The experimental videos can be found on the project website: https://sites.google.com/berkeley.edu/equicontact

EquiContact: A Hierarchical SE(3) Vision-to-Force Equivariant Policy for Spatially Generalizable Contact-rich Tasks

TL;DR

Abstract

EquiContact: A Hierarchical SE(3) Vision-to-Force Equivariant Policy for Spatially Generalizable Contact-rich Tasks

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (4)