Table of Contents
Fetching ...

High-Quality 3D Creation from A Single Image Using Subject-Specific Knowledge Prior

Nan Huang, Ting Zhang, Yuhui Yuan, Dong Chen, Shanghang Zhang

TL;DR

This work tackles the scarcity of diverse 3D data for robotics by enabling high-quality 3D creation from a single image. It introduces Customize-It-3D, a two-stage pipeline that uses a subject-specific diffusion prior learned from multi-modal cues to guide both geometry and texture via shading-aware NeRF optimization and subsequent texture-enhanced texture projection onto a point cloud. Key contributions include multi-modal DreamBooth for subject-specific priors, shading-mode guided NeRF supervision, depth regularization, and a mesh-to-point-cloud refinement with texture enhancement. Experimental results on RealFusion and a new benchmark demonstrate state-of-the-art geometry and texture fidelity, highlighting the method's potential to expand robotics 3D asset datasets from minimal input.

Abstract

In this paper, we address the critical bottleneck in robotics caused by the scarcity of diverse 3D data by presenting a novel two-stage approach for generating high-quality 3D models from a single image. This method is motivated by the need to efficiently expand 3D asset creation, particularly for robotics datasets, where the variety of object types is currently limited compared to general image datasets. Unlike previous methods that primarily rely on general diffusion priors, which often struggle to align with the reference image, our approach leverages subject-specific prior knowledge. By incorporating subject-specific priors in both geometry and texture, we ensure precise alignment between the generated 3D content and the reference object. Specifically, we introduce a shading mode-aware prior into the NeRF optimization process, enhancing the geometry and refining texture in the coarse outputs to achieve superior quality. Extensive experiments demonstrate that our method significantly outperforms prior approaches.

High-Quality 3D Creation from A Single Image Using Subject-Specific Knowledge Prior

TL;DR

This work tackles the scarcity of diverse 3D data for robotics by enabling high-quality 3D creation from a single image. It introduces Customize-It-3D, a two-stage pipeline that uses a subject-specific diffusion prior learned from multi-modal cues to guide both geometry and texture via shading-aware NeRF optimization and subsequent texture-enhanced texture projection onto a point cloud. Key contributions include multi-modal DreamBooth for subject-specific priors, shading-mode guided NeRF supervision, depth regularization, and a mesh-to-point-cloud refinement with texture enhancement. Experimental results on RealFusion and a new benchmark demonstrate state-of-the-art geometry and texture fidelity, highlighting the method's potential to expand robotics 3D asset datasets from minimal input.

Abstract

In this paper, we address the critical bottleneck in robotics caused by the scarcity of diverse 3D data by presenting a novel two-stage approach for generating high-quality 3D models from a single image. This method is motivated by the need to efficiently expand 3D asset creation, particularly for robotics datasets, where the variety of object types is currently limited compared to general image datasets. Unlike previous methods that primarily rely on general diffusion priors, which often struggle to align with the reference image, our approach leverages subject-specific prior knowledge. By incorporating subject-specific priors in both geometry and texture, we ensure precise alignment between the generated 3D content and the reference object. Specifically, we introduce a shading mode-aware prior into the NeRF optimization process, enhancing the geometry and refining texture in the coarse outputs to achieve superior quality. Extensive experiments demonstrate that our method significantly outperforms prior approaches.
Paper Structure (10 sections, 6 equations, 7 figures, 2 tables)

This paper contains 10 sections, 6 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: We propose a two-stage framework for 3D creation from a reference image with subject-specific diffusion prior (Sec. \ref{['subsec:joint prior']}). At the coarse stage, we optimize a NeRF for reconstructing the geometry of the reference image in a shading-aware manner(Sec. \ref{['subsec: coarse stage']}). We further build point clouds with enhanced texture from the coarse stage, and jointly optimize the texture of invisible points and a learnable deferred renderer to generate realistic and view-consistent textures (Sec. \ref{['subsec:refine stage']}).
  • Figure 2: An example of a Lego castle. Point cloud building results from (1) depth images Tang_2023_ICCV and (2) our method.
  • Figure 3: Qualitative comparison on image-to-3D generation. We compare Customize-It-3D to RealFusion melas2023realfusion, Make-it-3D Tang_2023_ICCV , Magic123 qian2023magic123 and DreamGaussian tang2023dreamgaussian for creating 3D objects from a single unposed image (the leftmost column).
  • Figure 4: Comparison with Magic123 in coarse stage.
  • Figure 5: The effect of multi-modal DreamBooth.
  • ...and 2 more figures