Table of Contents
Fetching ...

DenseMatcher: Learning 3D Semantic Correspondence for Category-Level Manipulation from a Single Demo

Junzhe Zhu, Yuanchen Ju, Junyi Zhang, Muhan Wang, Zhecheng Yuan, Kaizhe Hu, Huazhe Xu

TL;DR

DenseMatcher tackles dense 3D semantic correspondence across daily textured objects by fusing multiview 2D foundation features with a trainable 3D refinement, and solving correspondences with an enhanced functional map. The approach introduces DenseCorr3D, a textured 3D matching dataset with semantic groups, and achieves a 43.5% improvement over prior baselines on dense matching tasks. Key innovations include a semantic-distance driven loss $L_{semantic}$, a feature-preservation loss $L_{preservation}$, and a strengthened functional map with entropy and normalization constraints to yield sparse, consistent mappings, expressed via $\Pi = \Phi_N C \Phi_M^+$ and related terms. The method demonstrates strong zero-shot generalization to unseen categories, enabling long-horizon robotic manipulation from a single demonstration and enabling 3D color transfer across objects with relatable geometry, highlighting practical impact for robotics and 3D content creation.

Abstract

Dense 3D correspondence can enhance robotic manipulation by enabling the generalization of spatial, functional, and dynamic information from one object to an unseen counterpart. Compared to shape correspondence, semantic correspondence is more effective in generalizing across different object categories. To this end, we present DenseMatcher, a method capable of computing 3D correspondences between in-the-wild objects that share similar structures. DenseMatcher first computes vertex features by projecting multiview 2D features onto meshes and refining them with a 3D network, and subsequently finds dense correspondences with the obtained features using functional map. In addition, we craft the first 3D matching dataset that contains colored object meshes across diverse categories. In our experiments, we show that DenseMatcher significantly outperforms prior 3D matching baselines by 43.5%. We demonstrate the downstream effectiveness of DenseMatcher in (i) robotic manipulation, where it achieves cross-instance and cross-category generalization on long-horizon complex manipulation tasks from observing only one demo; (ii) zero-shot color mapping between digital assets, where appearance can be transferred between different objects with relatable geometry.

DenseMatcher: Learning 3D Semantic Correspondence for Category-Level Manipulation from a Single Demo

TL;DR

DenseMatcher tackles dense 3D semantic correspondence across daily textured objects by fusing multiview 2D foundation features with a trainable 3D refinement, and solving correspondences with an enhanced functional map. The approach introduces DenseCorr3D, a textured 3D matching dataset with semantic groups, and achieves a 43.5% improvement over prior baselines on dense matching tasks. Key innovations include a semantic-distance driven loss , a feature-preservation loss , and a strengthened functional map with entropy and normalization constraints to yield sparse, consistent mappings, expressed via and related terms. The method demonstrates strong zero-shot generalization to unseen categories, enabling long-horizon robotic manipulation from a single demonstration and enabling 3D color transfer across objects with relatable geometry, highlighting practical impact for robotics and 3D content creation.

Abstract

Dense 3D correspondence can enhance robotic manipulation by enabling the generalization of spatial, functional, and dynamic information from one object to an unseen counterpart. Compared to shape correspondence, semantic correspondence is more effective in generalizing across different object categories. To this end, we present DenseMatcher, a method capable of computing 3D correspondences between in-the-wild objects that share similar structures. DenseMatcher first computes vertex features by projecting multiview 2D features onto meshes and refining them with a 3D network, and subsequently finds dense correspondences with the obtained features using functional map. In addition, we craft the first 3D matching dataset that contains colored object meshes across diverse categories. In our experiments, we show that DenseMatcher significantly outperforms prior 3D matching baselines by 43.5%. We demonstrate the downstream effectiveness of DenseMatcher in (i) robotic manipulation, where it achieves cross-instance and cross-category generalization on long-horizon complex manipulation tasks from observing only one demo; (ii) zero-shot color mapping between digital assets, where appearance can be transferred between different objects with relatable geometry.

Paper Structure

This paper contains 52 sections, 1 theorem, 24 equations, 12 figures, 6 tables.

Key Result

Lemma A.1

For a function $x \in \mathbb{R}^n$, its Laplacian $\Delta(x)$ can be computed as $\Phi \Lambda X$ from its full-rank spectral coefficients $X = \Phi^{+} x \in \mathbb{R}^k$.

Figures (12)

  • Figure 1: (a) Zero-shot color transfer between 3D assets. (b) In real-world robotic experiments, we use DenseMatcher to transfer a manipulation sequence to the robot from a single human demonstration. Circles represent the contact points in the human demo / grasping points for robot manipulation.
  • Figure 2: The 4 types of correspondence. The reference image is on the left, while the right side demonstrates 1) 3D dense, 2) 3D sparse, 3) 2D dense, and 4) 2D sparse correspondences.
  • Figure 3: Predicted correspondences on few-shot categories. DenseMatcher can generalize across diverse topological variations, given only 5 training examples per category. To ensure that the model is not reliant on canonical spatial poses, we randomly rotate the mesh before the test procedure.
  • Figure 4: Semantic group annotations examples of apple, banana, animals (deer, tiger, elephant), and chairs. Different colors represent different semantic groups across the same category. DenseCorr3D contains objects of varying topologies and structures, both across and within categories.
  • Figure 5: Two possible partitioning schemes for a hand are shown.
  • ...and 7 more figures

Theorems & Definitions (1)

  • Lemma A.1