Single Image to High-Quality 3D Object via Latent Features
Huanning Dong, Yinuo Huang, Fan Li, Ping Kuang
TL;DR
The paper addresses the challenge of producing fast, high-fidelity 3D objects from a single image. It introduces LatentDreamer, which maps SDF geometries to latent features via a pre-trained SDF Autoencoder to ease optimization, uses six-view priors from diffusion models, and employs a self-attention-based LIN for quick latent initialization, followed by a coarse-to-fine refinement and view-aware texture generation. A random geometry generator enables open-world generalization without public 3D data, and ablations demonstrate the criticality of LIN, SDF fixation, and mesh refinement. The result is a fast (about 70 seconds) and robust pipeline that delivers high-quality textured meshes and competitive quantitative performance, with strong potential for text-guided and interactive 3D content creation.
Abstract
3D assets are essential in the digital age. While automatic 3D generation, such as image-to-3d, has made significant strides in recent years, it often struggles to achieve fast, detailed, and high-fidelity generation simultaneously. In this work, we introduce LatentDreamer, a novel framework for generating 3D objects from single images. The key to our approach is a pre-trained variational autoencoder that maps 3D geometries to latent features, which greatly reducing the difficulty of 3D generation. Starting from latent features, the pipeline of LatentDreamer generates coarse geometries, refined geometries, and realistic textures sequentially. The 3D objects generated by LatentDreamer exhibit high fidelity to the input images, and the entire generation process can be completed within a short time (typically in 70 seconds). Extensive experiments show that with only a small amount of training, LatentDreamer demonstrates competitive performance compared to contemporary approachs.
