Table of Contents
Fetching ...

Single Image to High-Quality 3D Object via Latent Features

Huanning Dong, Yinuo Huang, Fan Li, Ping Kuang

TL;DR

The paper addresses the challenge of producing fast, high-fidelity 3D objects from a single image. It introduces LatentDreamer, which maps SDF geometries to latent features via a pre-trained SDF Autoencoder to ease optimization, uses six-view priors from diffusion models, and employs a self-attention-based LIN for quick latent initialization, followed by a coarse-to-fine refinement and view-aware texture generation. A random geometry generator enables open-world generalization without public 3D data, and ablations demonstrate the criticality of LIN, SDF fixation, and mesh refinement. The result is a fast (about 70 seconds) and robust pipeline that delivers high-quality textured meshes and competitive quantitative performance, with strong potential for text-guided and interactive 3D content creation.

Abstract

3D assets are essential in the digital age. While automatic 3D generation, such as image-to-3d, has made significant strides in recent years, it often struggles to achieve fast, detailed, and high-fidelity generation simultaneously. In this work, we introduce LatentDreamer, a novel framework for generating 3D objects from single images. The key to our approach is a pre-trained variational autoencoder that maps 3D geometries to latent features, which greatly reducing the difficulty of 3D generation. Starting from latent features, the pipeline of LatentDreamer generates coarse geometries, refined geometries, and realistic textures sequentially. The 3D objects generated by LatentDreamer exhibit high fidelity to the input images, and the entire generation process can be completed within a short time (typically in 70 seconds). Extensive experiments show that with only a small amount of training, LatentDreamer demonstrates competitive performance compared to contemporary approachs.

Single Image to High-Quality 3D Object via Latent Features

TL;DR

The paper addresses the challenge of producing fast, high-fidelity 3D objects from a single image. It introduces LatentDreamer, which maps SDF geometries to latent features via a pre-trained SDF Autoencoder to ease optimization, uses six-view priors from diffusion models, and employs a self-attention-based LIN for quick latent initialization, followed by a coarse-to-fine refinement and view-aware texture generation. A random geometry generator enables open-world generalization without public 3D data, and ablations demonstrate the criticality of LIN, SDF fixation, and mesh refinement. The result is a fast (about 70 seconds) and robust pipeline that delivers high-quality textured meshes and competitive quantitative performance, with strong potential for text-guided and interactive 3D content creation.

Abstract

3D assets are essential in the digital age. While automatic 3D generation, such as image-to-3d, has made significant strides in recent years, it often struggles to achieve fast, detailed, and high-fidelity generation simultaneously. In this work, we introduce LatentDreamer, a novel framework for generating 3D objects from single images. The key to our approach is a pre-trained variational autoencoder that maps 3D geometries to latent features, which greatly reducing the difficulty of 3D generation. Starting from latent features, the pipeline of LatentDreamer generates coarse geometries, refined geometries, and realistic textures sequentially. The 3D objects generated by LatentDreamer exhibit high fidelity to the input images, and the entire generation process can be completed within a short time (typically in 70 seconds). Extensive experiments show that with only a small amount of training, LatentDreamer demonstrates competitive performance compared to contemporary approachs.

Paper Structure

This paper contains 18 sections, 12 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Result of our method. Our method can generate high-quality textured 3D objects from single images in 70 seconds. Input images (along with text prompts), normal maps of the generated objects, and multi-view color images are shown.
  • Figure 2: Overview of LatentDreamer. (Left) We pre-trained a SAE to decode SDF from latent features. Concurrently, we trained an end-to-end latent feature initialization network (LIN) to predict latent features from multi-view normals for rapid initialization. All training data is generated using a random geometry generator. (Right) For a single image input, LatentDreamer sequentially generates multi-view priors, initialized latent features, coarse mesh, refined mesh, and realistic mesh texture, with the entire process taking only 70 seconds to complete.
  • Figure 3: View-aware loss. We assign different weights to pixels based on the direction of their normals. In this figure, brighter points indicate higher weights for those pixels.
  • Figure 4: Comparison in single-image-to-3D generation. Input images, meshes, and rendering RGB results are shown. The 3D representation of LGM lgm is 3D Gaussians, from where we obtain its rendering results. The other rendering results are derived from the textured mesh. Zoom-in for details.
  • Figure 5: Qualitative results on GSO gso. The 3D objects generated by our method exhibit significantly better fidelity and are free from geometric flattening and distortion.
  • ...and 3 more figures