Table of Contents
Fetching ...

Local Deep Implicit Functions for 3D Shape

Kyle Genova, Forrester Cole, Avneesh Sud, Aaron Sarna, Thomas Funkhouser

TL;DR

LDIF presents a structured local implicit representation that decomposes 3D shapes into overlapping Gaussian-based regions, each with a small latent code, enabling accurate surface reconstruction with far fewer parameters than global DIFs. By coupling a SIF-inspired space decomposition with local deep implicit functions decoded per element, LDIF achieves superior reconstruction and generalization, including unseen classes, while remaining computationally efficient. The approach supports end-to-end training from depth or mesh inputs, delivers effective depth completion, and extends to partial human-body scans without domain-specific templates. Overall, LDIF offers a scalable, high-detail 3D representation that balances structure and local detail to improve both accuracy and generalization across diverse shape collections.

Abstract

The goal of this project is to learn a 3D shape representation that enables accurate surface reconstruction, compact storage, efficient computation, consistency for similar shapes, generalization across diverse shape categories, and inference from depth camera observations. Towards this end, we introduce Local Deep Implicit Functions (LDIF), a 3D shape representation that decomposes space into a structured set of learned implicit functions. We provide networks that infer the space decomposition and local deep implicit functions from a 3D mesh or posed depth image. During experiments, we find that it provides 10.3 points higher surface reconstruction accuracy (F-Score) than the state-of-the-art (OccNet), while requiring fewer than 1 percent of the network parameters. Experiments on posed depth image completion and generalization to unseen classes show 15.8 and 17.8 point improvements over the state-of-the-art, while producing a structured 3D representation for each input with consistency across diverse shape collections.

Local Deep Implicit Functions for 3D Shape

TL;DR

LDIF presents a structured local implicit representation that decomposes 3D shapes into overlapping Gaussian-based regions, each with a small latent code, enabling accurate surface reconstruction with far fewer parameters than global DIFs. By coupling a SIF-inspired space decomposition with local deep implicit functions decoded per element, LDIF achieves superior reconstruction and generalization, including unseen classes, while remaining computationally efficient. The approach supports end-to-end training from depth or mesh inputs, delivers effective depth completion, and extends to partial human-body scans without domain-specific templates. Overall, LDIF offers a scalable, high-detail 3D representation that balances structure and local detail to improve both accuracy and generalization across diverse shape collections.

Abstract

The goal of this project is to learn a 3D shape representation that enables accurate surface reconstruction, compact storage, efficient computation, consistency for similar shapes, generalization across diverse shape categories, and inference from depth camera observations. Towards this end, we introduce Local Deep Implicit Functions (LDIF), a 3D shape representation that decomposes space into a structured set of learned implicit functions. We provide networks that infer the space decomposition and local deep implicit functions from a 3D mesh or posed depth image. During experiments, we find that it provides 10.3 points higher surface reconstruction accuracy (F-Score) than the state-of-the-art (OccNet), while requiring fewer than 1 percent of the network parameters. Experiments on posed depth image completion and generalization to unseen classes show 15.8 and 17.8 point improvements over the state-of-the-art, while producing a structured 3D representation for each input with consistency across diverse shape collections.

Paper Structure

This paper contains 14 sections, 6 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: This paper introduces Local Deep Implicit Functions, a 3D shape representation that decomposes an input shape (mesh on left in every triplet) into a structured set of shape elements (colored ellipses on right) whose contributions to an implicit surface reconstruction (middle) are represented by latent vectors decoded by a deep network. Project video and website at https://ldif.cs.princeton.edu.
  • Figure 2: Network architecture. Our system takes in one or more posed depth images and outputs an LDIF function that can be used to classify inside/outside for any query point $\mathbf{x}$. It starts with a SIF encoder to extract a set of overlapping shape elements, each defined by a local Gaussian region of support parameterized by $\mathbf{\theta_i}$. It then extracts sample points/normals from the depth images and passes them through a PointNet encoder for each shape element to produce a latent vector $\mathbf{z_i}$. A local decoder network is used to decode each $\mathbf{z_i}$ to produce an implicit function $f_i(\mathbf{x}, \mathbf{z_i})$, which is combined with the local Gaussian function $g(\mathbf{x}, \mathbf{\theta_i})$ and summed with other shape elements to produce the output function LDIF$(\mathbf{x})$.
  • Figure 3: Autoencoder examples. F-scores for the test set (8746 shapes) are shown ordered by the LDIF F-score, with examples marked with their position on the curve. Our reconstructions (blue curve) are most accurate for 93% of shapes (exact scores shown faded). The scores of OccNet and SIF follow roughly the same curve as LDIF (rolling means shown bold), indicating shapes are similarly difficult for all methods. Solid shapes such as the rifle are relatively easy to represent, while shapes with irregular, thin structures such as the lamp are more difficult.
  • Figure 4: Representation efficiency. F-score vs. model complexity. Curves show varying $M$ for constant $N$. Other methods marked as points. Top: F-score vs. count of decoder parameters. The $N=32, M=32$ configuration (large dot) reaches >90% F-score with <1% of the parameters of OccNet, and is used as the benchmark configuration in this paper. Bottom: F-score vs. shape vector dimension ($\lvert\mathbf{\Theta}\rvert + \lvert\mathbf{Z}\rvert$ for DSIF). DSIF achieves similar reconstruction accuracy to OccNet at the same dimensionality, and can use additional dimensions to further improve accuracy.
  • Figure 5: Representation consistency. Example shape decompositions produced by our model trained multi-class on 3D-R$^2$N$^2$. Shape elements are depicted by their support ellipsoids and colored consistently by index. Note that the shape element shown in brown is used to represent the right-front leg of the chairs, tables, desks, and sofas, as well as the front-right wheel of the cars.
  • ...and 4 more figures