High-Quality 3D Creation from A Single Image Using Subject-Specific Knowledge Prior
Nan Huang, Ting Zhang, Yuhui Yuan, Dong Chen, Shanghang Zhang
TL;DR
This work tackles the scarcity of diverse 3D data for robotics by enabling high-quality 3D creation from a single image. It introduces Customize-It-3D, a two-stage pipeline that uses a subject-specific diffusion prior learned from multi-modal cues to guide both geometry and texture via shading-aware NeRF optimization and subsequent texture-enhanced texture projection onto a point cloud. Key contributions include multi-modal DreamBooth for subject-specific priors, shading-mode guided NeRF supervision, depth regularization, and a mesh-to-point-cloud refinement with texture enhancement. Experimental results on RealFusion and a new benchmark demonstrate state-of-the-art geometry and texture fidelity, highlighting the method's potential to expand robotics 3D asset datasets from minimal input.
Abstract
In this paper, we address the critical bottleneck in robotics caused by the scarcity of diverse 3D data by presenting a novel two-stage approach for generating high-quality 3D models from a single image. This method is motivated by the need to efficiently expand 3D asset creation, particularly for robotics datasets, where the variety of object types is currently limited compared to general image datasets. Unlike previous methods that primarily rely on general diffusion priors, which often struggle to align with the reference image, our approach leverages subject-specific prior knowledge. By incorporating subject-specific priors in both geometry and texture, we ensure precise alignment between the generated 3D content and the reference object. Specifically, we introduce a shading mode-aware prior into the NeRF optimization process, enhancing the geometry and refining texture in the coarse outputs to achieve superior quality. Extensive experiments demonstrate that our method significantly outperforms prior approaches.
