Table of Contents
Fetching ...

Neural Semantic Surface Maps

Luca Morreale, Noam Aigerman, Vladimir G. Kim, Niloy J. Mitra

TL;DR

An automated technique for computing a map between two genus‐zero shapes, which matches semantically corresponding regions to one another, which proves effective in scenarios with high semantic complexity, where objects are non‐isometrically related, as well as in situations where they are nearly isometric.

Abstract

We present an automated technique for computing a map between two genus-zero shapes, which matches semantically corresponding regions to one another. Lack of annotated data prohibits direct inference of 3D semantic priors; instead, current State-of-the-art methods predominantly optimize geometric properties or require varying amounts of manual annotation. To overcome the lack of annotated training data, we distill semantic matches from pre-trained vision models: our method renders the pair of 3D shapes from multiple viewpoints; the resulting renders are then fed into an off-the-shelf image-matching method which leverages a pretrained visual model to produce feature points. This yields semantic correspondences, which can be projected back to the 3D shapes, producing a raw matching that is inaccurate and inconsistent between different viewpoints. These correspondences are refined and distilled into an inter-surface map by a dedicated optimization scheme, which promotes bijectivity and continuity of the output map. We illustrate that our approach can generate semantic surface-to-surface maps, eliminating manual annotations or any 3D training data requirement. Furthermore, it proves effective in scenarios with high semantic complexity, where objects are non-isometrically related, as well as in situations where they are nearly isometric.

Neural Semantic Surface Maps

TL;DR

An automated technique for computing a map between two genus‐zero shapes, which matches semantically corresponding regions to one another, which proves effective in scenarios with high semantic complexity, where objects are non‐isometrically related, as well as in situations where they are nearly isometric.

Abstract

We present an automated technique for computing a map between two genus-zero shapes, which matches semantically corresponding regions to one another. Lack of annotated data prohibits direct inference of 3D semantic priors; instead, current State-of-the-art methods predominantly optimize geometric properties or require varying amounts of manual annotation. To overcome the lack of annotated training data, we distill semantic matches from pre-trained vision models: our method renders the pair of 3D shapes from multiple viewpoints; the resulting renders are then fed into an off-the-shelf image-matching method which leverages a pretrained visual model to produce feature points. This yields semantic correspondences, which can be projected back to the 3D shapes, producing a raw matching that is inaccurate and inconsistent between different viewpoints. These correspondences are refined and distilled into an inter-surface map by a dedicated optimization scheme, which promotes bijectivity and continuity of the output map. We illustrate that our approach can generate semantic surface-to-surface maps, eliminating manual annotations or any 3D training data requirement. Furthermore, it proves effective in scenarios with high semantic complexity, where objects are non-isometrically related, as well as in situations where they are nearly isometric.
Paper Structure (29 sections, 11 equations, 14 figures, 3 tables, 1 algorithm)

This paper contains 29 sections, 11 equations, 14 figures, 3 tables, 1 algorithm.

Figures (14)

  • Figure 1: Fuzzy semantic correspondences. (Left) We lift 2D image-based correspondences, obtained using a pre-trained vision-transformer oquab2023dinov2 on rendered image source/target pairs from sampled views, to obtain fuzzy and spurious 3D (semantic) correspondences. We collect correspondence, shown with coloring and a random set highlighted with lines, from each of the sampled views and aggregate them across views to get aggregated fuzzy matches (middle), which contain erroneous matching, e.g., thigh getting mapped to the arm. (Right) We propose an optimization to distill these fuzzy matches into an inter-surface map, here depicting a subset of matches closer than a given threshold ($d<0.1$) wrt the optimized map.
  • Figure 2: Overview. Starting from a pair of upright genus-zero surfaces, we automatically distill an inter-surface map from a set of fuzzy matches. First, we align the input shapes, then extract a set of fuzzy matches through DinoV2 oquab2023dinov2 semantic visual features. We use these features to independently cut the two meshes and then optimize a (seamless) map between them.
  • Figure 3: Co-aligning input surfaces: Starting from a pair of upright meshes (bison and bull in this example), we render $12$ views around them ($s^\mathbf{A}$ and $s^\mathbf{B}$). Then, we extract DinoV2 features from each rendering independently and match these features as a string-matching problem. Specifically, we optimize over a cyclic shift of the rendered views (i.e., one degree of freedom) to maximize agreement of image-based semantic correspondences.
  • Figure 4: Cutting through cone points. We collect a set of spurious and noisy matches (a). Then, we select the most reliable $K=3$ correspondences (b). Finally, using these correspondences as cut endpoints, or cones, we cut the two meshes independently (c). Note how the cut differs in the two shapes: the man is cut through the back, while the woman is cut through the front. Refer to Sec. \ref{['par:cones']} for details.
  • Figure 5: Seamless cuts. To parametrize a genus-zero mesh (a) we cut and map it to a disc topology, cut visualized as in (b). The two corresponding sides of the cut match perfectly, i.e., when we connect the two parts, the map remains continuous across the cut (c).
  • ...and 9 more figures