Table of Contents
Fetching ...

SimNP: Learning Self-Similarity Priors Between Neural Points

Christopher Wewer, Eddy Ilg, Bernt Schiele, Jan Eric Lenssen

TL;DR

SimNP introduces a category-level, coherent neural point radiance field that learns self-similarity priors across neural points via a shared attention mechanism $\textbf{A}$ linking to embeddings $\textbf{E}$. By combining a local, point-based representation with a category-wide prior, it achieves detailed reconstructions of unseen regions while facilitating semantic correspondences, using an autodecoder-based training regime and test-time optimization of instance embeddings. The approach yields state-of-the-art or competitive results for single- and two-view reconstruction on ShapeNet objects, with efficient rendering (approximately $59$ ms per view) and interpretable learned symmetries. Limitations include reliance on canonical point clouds during training; future work could extend the self-similarity priors to scenes and relax the need for ground-truth point clouds.

Abstract

Existing neural field representations for 3D object reconstruction either (1) utilize object-level representations, but suffer from low-quality details due to conditioning on a global latent code, or (2) are able to perfectly reconstruct the observations, but fail to utilize object-level prior knowledge to infer unobserved regions. We present SimNP, a method to learn category-level self-similarities, which combines the advantages of both worlds by connecting neural point radiance fields with a category-level self-similarity representation. Our contribution is two-fold. (1) We design the first neural point representation on a category level by utilizing the concept of coherent point clouds. The resulting neural point radiance fields store a high level of detail for locally supported object regions. (2) We learn how information is shared between neural points in an unconstrained and unsupervised fashion, which allows to derive unobserved regions of an object during the reconstruction process from given observations. We show that SimNP is able to outperform previous methods in reconstructing symmetric unseen object regions, surpassing methods that build upon category-level or pixel-aligned radiance fields, while providing semantic correspondences between instances

SimNP: Learning Self-Similarity Priors Between Neural Points

TL;DR

SimNP introduces a category-level, coherent neural point radiance field that learns self-similarity priors across neural points via a shared attention mechanism linking to embeddings . By combining a local, point-based representation with a category-wide prior, it achieves detailed reconstructions of unseen regions while facilitating semantic correspondences, using an autodecoder-based training regime and test-time optimization of instance embeddings. The approach yields state-of-the-art or competitive results for single- and two-view reconstruction on ShapeNet objects, with efficient rendering (approximately ms per view) and interpretable learned symmetries. Limitations include reliance on canonical point clouds during training; future work could extend the self-similarity priors to scenes and relax the need for ground-truth point clouds.

Abstract

Existing neural field representations for 3D object reconstruction either (1) utilize object-level representations, but suffer from low-quality details due to conditioning on a global latent code, or (2) are able to perfectly reconstruct the observations, but fail to utilize object-level prior knowledge to infer unobserved regions. We present SimNP, a method to learn category-level self-similarities, which combines the advantages of both worlds by connecting neural point radiance fields with a category-level self-similarity representation. Our contribution is two-fold. (1) We design the first neural point representation on a category level by utilizing the concept of coherent point clouds. The resulting neural point radiance fields store a high level of detail for locally supported object regions. (2) We learn how information is shared between neural points in an unconstrained and unsupervised fashion, which allows to derive unobserved regions of an object during the reconstruction process from given observations. We show that SimNP is able to outperform previous methods in reconstructing symmetric unseen object regions, surpassing methods that build upon category-level or pixel-aligned radiance fields, while providing semantic correspondences between instances
Paper Structure (35 sections, 11 equations, 15 figures, 6 tables)

This paper contains 35 sections, 11 equations, 15 figures, 6 tables.

Figures (15)

  • Figure 1: Figure 1a) We present SimNP, a renderable neural point radiance field that learns category-level self-similarities from data by connecting neural points to embeddings via optimized bipartite attention scores. b) The learned self-similarities can be used to transfer details from single- or few-view observations to unobserved, similar and symmetric parts of objects.
  • Figure 2: Overview of SimNP. Our method is a category-level, coherent neural point radiance field, where points are connected to embedding vectors $\mathbf{E}$ via learnable attention scores $\mathbf{A}$. The representation can be rendered using ray marching and a neural renderer. (a) During training, all parameters ($\blacksquare$, $\blacksquare$) are optimized using multi-view supervision. Networks, features $\mathbf{S}$, and scores are shared over the category ($\blacksquare$), while embeddings are instance-specific ($\blacksquare$). During inference, only embeddings $\mathbf{E}$ ($\blacksquare$) are optimized from observations. In case of similar points $i,j$ (e.g., those shown in red), the network learned $a_{i,k} \approx a_{j,k}\,\forall\,k$ during training. Thus, supervision from one side means only one of points $i$ and $j$ needs to be visible to infer the value of embedding $k$. (b) Given optimized embeddings, we can render the object from novel views.
  • Figure 3: Coherent point cloud prediction. An MLP with bottleneck is used to enforce the point cloud to be constructed as a low-rank deformation of a template (second to last layer output). During training, all trainable parameters ($\blacksquare$, $\blacksquare$) are optimized using ground-truth point cloud supervision. During inference, the embedding $\mathbf{z}$ ($\blacksquare$) can be optimized using different supervision signals: a ResNet predicting a point cloud from single image, mask, or depth.
  • Figure 4: a) Our method enables more detailed reconstructions and can better transfer appearance information to symmetric regions compared to SRN, PixelNeRF, and VisionNeRF. b) We show metric comparisons for each target view in the 251-view SRN test spiral for single-view reconstruction of cars. As visible, our overall better performance can be attributed to views showing regions symmetric to the input view (green areas and example views 10, 84, 175). Also, the related object-level method SRN shows rather flat curves indicating a weak adaptation to observations. This is not the case for SimNP.
  • Figure 5: Two-view reconstruction. SimNP learns a high-quality 3D object representation given only two input views.
  • ...and 10 more figures