DenseMatcher: Learning 3D Semantic Correspondence for Category-Level Manipulation from a Single Demo
Junzhe Zhu, Yuanchen Ju, Junyi Zhang, Muhan Wang, Zhecheng Yuan, Kaizhe Hu, Huazhe Xu
TL;DR
DenseMatcher tackles dense 3D semantic correspondence across daily textured objects by fusing multiview 2D foundation features with a trainable 3D refinement, and solving correspondences with an enhanced functional map. The approach introduces DenseCorr3D, a textured 3D matching dataset with semantic groups, and achieves a 43.5% improvement over prior baselines on dense matching tasks. Key innovations include a semantic-distance driven loss $L_{semantic}$, a feature-preservation loss $L_{preservation}$, and a strengthened functional map with entropy and normalization constraints to yield sparse, consistent mappings, expressed via $\Pi = \Phi_N C \Phi_M^+$ and related terms. The method demonstrates strong zero-shot generalization to unseen categories, enabling long-horizon robotic manipulation from a single demonstration and enabling 3D color transfer across objects with relatable geometry, highlighting practical impact for robotics and 3D content creation.
Abstract
Dense 3D correspondence can enhance robotic manipulation by enabling the generalization of spatial, functional, and dynamic information from one object to an unseen counterpart. Compared to shape correspondence, semantic correspondence is more effective in generalizing across different object categories. To this end, we present DenseMatcher, a method capable of computing 3D correspondences between in-the-wild objects that share similar structures. DenseMatcher first computes vertex features by projecting multiview 2D features onto meshes and refining them with a 3D network, and subsequently finds dense correspondences with the obtained features using functional map. In addition, we craft the first 3D matching dataset that contains colored object meshes across diverse categories. In our experiments, we show that DenseMatcher significantly outperforms prior 3D matching baselines by 43.5%. We demonstrate the downstream effectiveness of DenseMatcher in (i) robotic manipulation, where it achieves cross-instance and cross-category generalization on long-horizon complex manipulation tasks from observing only one demo; (ii) zero-shot color mapping between digital assets, where appearance can be transferred between different objects with relatable geometry.
