Table of Contents
Fetching ...

WeightedPose: Generalizable Cross-Pose Estimation via Weighted SVD

Xuxin Cheng, Heng Yu, Harry Zhang, Wenxing Deng

TL;DR

The paper tackles cross-pose estimation in human environments by moving beyond end-to-end pixel-to-action policies and instead learning 3D geometric relationships between object pairs. It introduces the Unified Weighted Pose architecture, which fuses Goal Flow for articulated parts and TAX-Pose for free-floating objects through a learned weight $w$, and computes the alignment via a weighted SVD on $A\Gamma B^\top$, yielding a rotation $R$ and translation $t$. Training uses multiple loss terms (e.g., $L_{disp}$, $L_{corr}$, $L_{cons}$, $L_{tf}$) and demonstrations with $\mathbf{T}_{\mathcal{A}\mathcal{B}}=\mathbf{I}$, while evaluating on PartNet-Mobility and Ravens to compare against baselines. Key findings show that the original TAX-Pose loss often performs best across categories, post-SVD losses can improve translation accuracy, and SE(3) supervision may marginally degrade performance, highlighting the nuanced benefits of the weighted, geometry-centric fusion. The approach offers a path toward robust, object-centric manipulation in varied configurations and suggests future work on generalizing the math beyond articulated vs. rigid objects and integrating with motion-planning as a geometric guidance tool.

Abstract

We introduce a new approach for robotic manipulation tasks in human settings that necessitates understanding the 3D geometric connections between a pair of objects. Conventional end-to-end training approaches, which convert pixel observations directly into robot actions, often fail to effectively understand complex pose relationships and do not easily adapt to new object configurations. To overcome these issues, our method focuses on learning the 3D geometric relationships, particularly how critical parts of one object relate to those of another. We employ Weighted SVD in our standalone model to analyze pose relationships both in articulated parts and in free-floating objects. For instance, our model can comprehend the spatial relationship between an oven door and the oven body, as well as between a lasagna plate and the oven. By concentrating on the 3D geometric connections, our strategy empowers robots to carry out intricate manipulation tasks based on object-centric perspectives

WeightedPose: Generalizable Cross-Pose Estimation via Weighted SVD

TL;DR

The paper tackles cross-pose estimation in human environments by moving beyond end-to-end pixel-to-action policies and instead learning 3D geometric relationships between object pairs. It introduces the Unified Weighted Pose architecture, which fuses Goal Flow for articulated parts and TAX-Pose for free-floating objects through a learned weight , and computes the alignment via a weighted SVD on , yielding a rotation and translation . Training uses multiple loss terms (e.g., , , , ) and demonstrations with , while evaluating on PartNet-Mobility and Ravens to compare against baselines. Key findings show that the original TAX-Pose loss often performs best across categories, post-SVD losses can improve translation accuracy, and SE(3) supervision may marginally degrade performance, highlighting the nuanced benefits of the weighted, geometry-centric fusion. The approach offers a path toward robust, object-centric manipulation in varied configurations and suggests future work on generalizing the math beyond articulated vs. rigid objects and integrating with motion-planning as a geometric guidance tool.

Abstract

We introduce a new approach for robotic manipulation tasks in human settings that necessitates understanding the 3D geometric connections between a pair of objects. Conventional end-to-end training approaches, which convert pixel observations directly into robot actions, often fail to effectively understand complex pose relationships and do not easily adapt to new object configurations. To overcome these issues, our method focuses on learning the 3D geometric relationships, particularly how critical parts of one object relate to those of another. We employ Weighted SVD in our standalone model to analyze pose relationships both in articulated parts and in free-floating objects. For instance, our model can comprehend the spatial relationship between an oven door and the oven body, as well as between a lasagna plate and the oven. By concentrating on the 3D geometric connections, our strategy empowers robots to carry out intricate manipulation tasks based on object-centric perspectives
Paper Structure (5 sections, 13 equations, 2 figures, 1 table)

This paper contains 5 sections, 13 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Unified Weighted Pose architecture. The model first takes as input a point cloud, and then learns to predict a weight for the point cloud. This weight is used in the downstream SVD module to combine the GoalFlow and TAX-Pose outputs.
  • Figure 2: Task illustration. The model needs to first output goal pose for opening the oven door, and then output goal pose for putting the block inside the oven.