Table of Contents
Fetching ...

DITTO: Dual and Integrated Latent Topologies for Implicit 3D Reconstruction

Jaehyeok Shim, Kyungdon Joo

TL;DR

The proposed DITTO leverages both point and grid latents to enhance their strengths, the stability of grid latents and the detailrich capability of point latents, achieving high-fidelity 3D reconstruction and surpassing previous state-of-the-art methods on object- and scene-level datasets.

Abstract

We propose a novel concept of dual and integrated latent topologies (DITTO in short) for implicit 3D reconstruction from noisy and sparse point clouds. Most existing methods predominantly focus on single latent type, such as point or grid latents. In contrast, the proposed DITTO leverages both point and grid latents (i.e., dual latent) to enhance their strengths, the stability of grid latents and the detail-rich capability of point latents. Concretely, DITTO consists of dual latent encoder and integrated implicit decoder. In the dual latent encoder, a dual latent layer, which is the key module block composing the encoder, refines both latents in parallel, maintaining their distinct shapes and enabling recursive interaction. Notably, a newly proposed dynamic sparse point transformer within the dual latent layer effectively refines point latents. Then, the integrated implicit decoder systematically combines these refined latents, achieving high-fidelity 3D reconstruction and surpassing previous state-of-the-art methods on object- and scene-level datasets, especially in thin and detailed structures.

DITTO: Dual and Integrated Latent Topologies for Implicit 3D Reconstruction

TL;DR

The proposed DITTO leverages both point and grid latents to enhance their strengths, the stability of grid latents and the detailrich capability of point latents, achieving high-fidelity 3D reconstruction and surpassing previous state-of-the-art methods on object- and scene-level datasets.

Abstract

We propose a novel concept of dual and integrated latent topologies (DITTO in short) for implicit 3D reconstruction from noisy and sparse point clouds. Most existing methods predominantly focus on single latent type, such as point or grid latents. In contrast, the proposed DITTO leverages both point and grid latents (i.e., dual latent) to enhance their strengths, the stability of grid latents and the detail-rich capability of point latents. Concretely, DITTO consists of dual latent encoder and integrated implicit decoder. In the dual latent encoder, a dual latent layer, which is the key module block composing the encoder, refines both latents in parallel, maintaining their distinct shapes and enabling recursive interaction. Notably, a newly proposed dynamic sparse point transformer within the dual latent layer effectively refines point latents. Then, the integrated implicit decoder systematically combines these refined latents, achieving high-fidelity 3D reconstruction and surpassing previous state-of-the-art methods on object- and scene-level datasets, especially in thin and detailed structures.
Paper Structure (14 sections, 7 equations, 9 figures, 5 tables)

This paper contains 14 sections, 7 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Scene-level 3D reconstruction comparison on the Synthetic Rooms dataset peng2020convolutional.DITTO maximizes the benefits of both grid and point latents, thereby improving 3D surface reconstruction performance. We particularly focus on refining features based on point latents along with grid latents and integrating them (namely, dual and integrated latent topologies). This advancement enhances the ability to restore complex structures precisely, such as thin and intricate geometries, which posed challenges for previous methods.
  • Figure 2: Overview of DITTO. DITTO architecture consists of the proposed dual latent encoder and integrated implicit decoder (IID) modules. In the encoder, DITTO receives a point cloud $\mathcal{P}$ and generates point and grid latents $\mathcal{C}$ and $\mathbf{T}$, respectively, using shallow FKAConv layers boulch2020fkaconv. These latents are refined in a U-shaped network composed with the proposed DLL to produce refined point and grid latents, respectively $\tilde{\mathcal{C}}$ and $\tilde{\mathbf{T}}$. Our IID estimates the occupancy of given arbitrary query locations. The mesh can be obtained by applying the Marching Cubes algorithm lorensen1998marching to the occupancies estimated as form of a regular grid.
  • Figure 3: Conceptual comparison of DITTO. We compare the concept of implicit 3D reconstruction methods in terms of latent representations: (a) encoders and (b) decoders. In (b), the image of green chairs represents grid features.
  • Figure 4: Overview of our proposed DLL. The input consists of $\mathcal{C}$ and $\mathbf T$, while the output comprises $\mathcal{C}'$ and $\mathbf T'$, representing the grid and point latents of the current layer, respectively. The $\bar{\mathbf T}$ and $\bar{\mathbf T}'$ represent the input and output grid latents, respectively, used for establishing a dense skip-connection between the layers.
  • Figure 5: Conceptual illustration of DSPT. We visualize DSPT in the 2D domain for better understanding, but DSPT works in the 3D domain by sequentially processing the x-, y-, and z-axes.
  • ...and 4 more figures