WeightedPose: Generalizable Cross-Pose Estimation via Weighted SVD
Xuxin Cheng, Heng Yu, Harry Zhang, Wenxing Deng
TL;DR
The paper tackles cross-pose estimation in human environments by moving beyond end-to-end pixel-to-action policies and instead learning 3D geometric relationships between object pairs. It introduces the Unified Weighted Pose architecture, which fuses Goal Flow for articulated parts and TAX-Pose for free-floating objects through a learned weight $w$, and computes the alignment via a weighted SVD on $A\Gamma B^\top$, yielding a rotation $R$ and translation $t$. Training uses multiple loss terms (e.g., $L_{disp}$, $L_{corr}$, $L_{cons}$, $L_{tf}$) and demonstrations with $\mathbf{T}_{\mathcal{A}\mathcal{B}}=\mathbf{I}$, while evaluating on PartNet-Mobility and Ravens to compare against baselines. Key findings show that the original TAX-Pose loss often performs best across categories, post-SVD losses can improve translation accuracy, and SE(3) supervision may marginally degrade performance, highlighting the nuanced benefits of the weighted, geometry-centric fusion. The approach offers a path toward robust, object-centric manipulation in varied configurations and suggests future work on generalizing the math beyond articulated vs. rigid objects and integrating with motion-planning as a geometric guidance tool.
Abstract
We introduce a new approach for robotic manipulation tasks in human settings that necessitates understanding the 3D geometric connections between a pair of objects. Conventional end-to-end training approaches, which convert pixel observations directly into robot actions, often fail to effectively understand complex pose relationships and do not easily adapt to new object configurations. To overcome these issues, our method focuses on learning the 3D geometric relationships, particularly how critical parts of one object relate to those of another. We employ Weighted SVD in our standalone model to analyze pose relationships both in articulated parts and in free-floating objects. For instance, our model can comprehend the spatial relationship between an oven door and the oven body, as well as between a lasagna plate and the oven. By concentrating on the 3D geometric connections, our strategy empowers robots to carry out intricate manipulation tasks based on object-centric perspectives
