Table of Contents
Fetching ...

POC-SLT: Partial Object Completion with SDF Latent Transformers

Faezeh Zakeri, Raphael Braun, Lukas Ruppert, Henrik P. A. Lensch

TL;DR

This work proposes a transformer operating on the latent space representing Signed Distance Fields (SDFs), where instead of a monolithic volume, the SDF of an object is partitioned into smaller high-resolution patches leading to a sequence of latent codes.

Abstract

3D geometric shape completion hinges on representation learning and a deep understanding of geometric data. Without profound insights into the three-dimensional nature of the data, this task remains unattainable. Our work addresses this challenge of 3D shape completion given partial observations by proposing a transformer operating on the latent space representing Signed Distance Fields (SDFs). Instead of a monolithic volume, the SDF of an object is partitioned into smaller high-resolution patches leading to a sequence of latent codes. The approach relies on a smooth latent space encoding learned via a variational autoencoder (VAE), trained on millions of 3D patches. We employ an efficient masked autoencoder transformer to complete partial sequences into comprehensive shapes in latent space. Our approach is extensively evaluated on partial observations from ShapeNet and the ABC dataset where only fractions of the objects are given. The proposed POC-SLT architecture compares favorably with several baseline state-of-the-art methods, demonstrating a significant improvement in 3D shape completion, both qualitatively and quantitatively.

POC-SLT: Partial Object Completion with SDF Latent Transformers

TL;DR

This work proposes a transformer operating on the latent space representing Signed Distance Fields (SDFs), where instead of a monolithic volume, the SDF of an object is partitioned into smaller high-resolution patches leading to a sequence of latent codes.

Abstract

3D geometric shape completion hinges on representation learning and a deep understanding of geometric data. Without profound insights into the three-dimensional nature of the data, this task remains unattainable. Our work addresses this challenge of 3D shape completion given partial observations by proposing a transformer operating on the latent space representing Signed Distance Fields (SDFs). Instead of a monolithic volume, the SDF of an object is partitioned into smaller high-resolution patches leading to a sequence of latent codes. The approach relies on a smooth latent space encoding learned via a variational autoencoder (VAE), trained on millions of 3D patches. We employ an efficient masked autoencoder transformer to complete partial sequences into comprehensive shapes in latent space. Our approach is extensively evaluated on partial observations from ShapeNet and the ABC dataset where only fractions of the objects are given. The proposed POC-SLT architecture compares favorably with several baseline state-of-the-art methods, demonstrating a significant improvement in 3D shape completion, both qualitatively and quantitatively.

Paper Structure

This paper contains 41 sections, 6 equations, 10 figures, 13 tables.

Figures (10)

  • Figure 1: Architecture Overview. The SDF is partitioned into tiles of $32^3$ samples. For each tile, a latent code is generated by a variational autoencoder (P-VAE) resulting in the potentially partial input stream to our SDF-Latent-Transformer. It is trained as a Masked Autoencoder to generate a completed series of tokens which are finally translated back to SDF tiles using the P-VAE decoder.
  • Figure 2: P-VAE (left): On millions of patches, we train a smooth embedding space for SDFs with a variational autoencoder (P-VAE) that samples the noise according to the predicted variance before passing the estimated latent to the decoder. SDF-Latent-Transformer (SLT) (right): Performs shape completion on input sequences consisting of SDF-Patches (cyan) in latent space. During training, some of the input patches are masked and substituted with a trainable shared vector (blue). The utilized masking schemes Random, Half, Octant and Slice are visualized on the right. 3D positional encoding is added before a TransformerEncoder propagates the information from the remaining patches to all the masked patches completing the 3D shape.
  • Figure 3: Completion of ShapeNet ShapeNet objects from bottom half. Comparison to AutoSDF AutoSDF_Mittal_2022_CVPR and AnchorFormer Chen2023AnchorFormer. Our SLT completes these objects more plausibly than AutoSDF. The density of completed points by AnchorFromer drastically varies in the completed regions.
  • Figure 4: Completion of ABC ABC_Koch_2019_CVPR objects from the bottom half (left) and octant (right). The SLT learned to complete partially symmetric objects quite successfully.
  • Figure 5: Completion of out-of-distribution objects from Objaverse Deitke2023Objaverse using the SLT trained on ShapeNet ShapeNet. The last three columns show completions of scanned real-world objects.
  • ...and 5 more figures