G3DR: Generative 3D Reconstruction in ImageNet
Pradyumna Reddy, Ismail Elezi, Jiankang Deng
TL;DR
G3DR tackles single-view 3D reconstruction in large, diverse datasets by integrating a diffusion-based rgbd generator with a triplane-based 3D decoder and a novel depth-regularization scheme to preserve geometry. It leverages CLIP for novel-view supervision and employs a multi-resolution sampling strategy to boost texture realism without increasing model size. The method achieves state-of-the-art geometry and competitive perceptual quality on ImageNet, with strong results across fine-grained datasets and text-conditioned generation, while significantly reducing training time. This approach offers a scalable path toward high-fidelity 3D content generation across broad categories for applications in VR/AR, gaming, and film production.
Abstract
We introduce a novel 3D generative method, Generative 3D Reconstruction (G3DR) in ImageNet, capable of generating diverse and high-quality 3D objects from single images, addressing the limitations of existing methods. At the heart of our framework is a novel depth regularization technique that enables the generation of scenes with high-geometric fidelity. G3DR also leverages a pretrained language-vision model, such as CLIP, to enable reconstruction in novel views and improve the visual realism of generations. Additionally, G3DR designs a simple but effective sampling procedure to further improve the quality of generations. G3DR offers diverse and efficient 3D asset generation based on class or text conditioning. Despite its simplicity, G3DR is able to beat state-of-theart methods, improving over them by up to 22% in perceptual metrics and 90% in geometry scores, while needing only half of the training time. Code is available at https://github.com/preddy5/G3DR
