Table of Contents
Fetching ...

LaGeM: A Large Geometry Model for 3D Representation Learning and Diffusion

Biao Zhang, Peter Wonka

TL;DR

A novel hierarchical autoencoder that maps 3D models into a highly compressed latent space and proposes a cascaded diffusion framework where each stage is conditioned on the previous stage to tackle the challenges arising from large-scale datasets and generative modeling using diffusion.

Abstract

This paper introduces a novel hierarchical autoencoder that maps 3D models into a highly compressed latent space. The hierarchical autoencoder is specifically designed to tackle the challenges arising from large-scale datasets and generative modeling using diffusion. Different from previous approaches that only work on a regular image or volume grid, our hierarchical autoencoder operates on unordered sets of vectors. Each level of the autoencoder controls different geometric levels of detail. We show that the model can be used to represent a wide range of 3D models while faithfully representing high-resolution geometry details. The training of the new architecture takes 0.70x time and 0.58x memory compared to the baseline. We also explore how the new representation can be used for generative modeling. Specifically, we propose a cascaded diffusion framework where each stage is conditioned on the previous stage. Our design extends existing cascaded designs for image and volume grids to vector sets.

LaGeM: A Large Geometry Model for 3D Representation Learning and Diffusion

TL;DR

A novel hierarchical autoencoder that maps 3D models into a highly compressed latent space and proposes a cascaded diffusion framework where each stage is conditioned on the previous stage to tackle the challenges arising from large-scale datasets and generative modeling using diffusion.

Abstract

This paper introduces a novel hierarchical autoencoder that maps 3D models into a highly compressed latent space. The hierarchical autoencoder is specifically designed to tackle the challenges arising from large-scale datasets and generative modeling using diffusion. Different from previous approaches that only work on a regular image or volume grid, our hierarchical autoencoder operates on unordered sets of vectors. Each level of the autoencoder controls different geometric levels of detail. We show that the model can be used to represent a wide range of 3D models while faithfully representing high-resolution geometry details. The training of the new architecture takes 0.70x time and 0.58x memory compared to the baseline. We also explore how the new representation can be used for generative modeling. Specifically, we propose a cascaded diffusion framework where each stage is conditioned on the previous stage. Our design extends existing cascaded designs for image and volume grids to vector sets.
Paper Structure (29 sections, 15 equations, 14 figures, 5 tables)

This paper contains 29 sections, 15 equations, 14 figures, 5 tables.

Figures (14)

  • Figure 1: Autoencoders. We show different autoencoder architectures here, including AE (AutoEncoder), U-Net, VAE kingma2013auto, NVAE vahdat2020nvae, VecSet vecset and the proposed LaGeM. VAE and NVAE are for image data, while VecSet and LaGeM are for geometry (distance function) data. In the top row, VAE and VecSet are using a single scale latent to represent the data. Both NVAE and LaGeM use multi-scale latents to represent data. All the previous works VAE, NVAE, and VecSet apply KL divergence in the bottleneck to regularize the latent space, while in this work, we apply standardization in the bottleneck.
  • Figure 2: Pipeline. We proposed a U-Net-style transformer for the autoencoding. In this way, we obtain a hierarchical latent space, which contains several levels of latents. To train the generative diffusion models in the latent space, we propose the cascaded latent diffusion models.
  • Figure 3: Geometry Autoencoder. The design from VecSet vecset can be seen as a special case of the proposed LaGeM network with only one level.
  • Figure 4: LaGeM architecture. We show an illustration with 3 levels of latents.
  • Figure 5: Multiresolution Features
  • ...and 9 more figures