Table of Contents
Fetching ...

RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D

Lingteng Qiu, Guanying Chen, Xiaodong Gu, Qi Zuo, Mutian Xu, Yushuang Wu, Weihao Yuan, Zilong Dong, Liefeng Bo, Xiaoguang Han

TL;DR

<3-5 sentence high-level summary> RichDreamer introduces a generalizable Normal-Depth diffusion prior to provide robust 3D geometry from text prompts, trained on LAION-2B and fine-tuned on Objaverse to retain real-world diversity. To address appearance ambiguities, it adds a depth-conditioned albedo diffusion model, regularizing albedo and improving relighting. The method integrates with NeRF and DMTet representations via Score Distillation Sampling to optimize geometry, while physics-based rendering and a learned albedo prior improve texture realism. Empirical results show state-of-the-art geometry and textured-model generation, strong generalization, and favorable user-study rankings across prompts.

Abstract

Lifting 2D diffusion for 3D generation is a challenging problem due to the lack of geometric prior and the complex entanglement of materials and lighting in natural images. Existing methods have shown promise by first creating the geometry through score-distillation sampling (SDS) applied to rendered surface normals, followed by appearance modeling. However, relying on a 2D RGB diffusion model to optimize surface normals is suboptimal due to the distribution discrepancy between natural images and normals maps, leading to instability in optimization. In this paper, recognizing that the normal and depth information effectively describe scene geometry and be automatically estimated from images, we propose to learn a generalizable Normal-Depth diffusion model for 3D generation. We achieve this by training on the large-scale LAION dataset together with the generalizable image-to-depth and normal prior models. In an attempt to alleviate the mixed illumination effects in the generated materials, we introduce an albedo diffusion model to impose data-driven constraints on the albedo component. Our experiments show that when integrated into existing text-to-3D pipelines, our models significantly enhance the detail richness, achieving state-of-the-art results. Our project page is https://aigc3d.github.io/richdreamer/.

RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D

TL;DR

<3-5 sentence high-level summary> RichDreamer introduces a generalizable Normal-Depth diffusion prior to provide robust 3D geometry from text prompts, trained on LAION-2B and fine-tuned on Objaverse to retain real-world diversity. To address appearance ambiguities, it adds a depth-conditioned albedo diffusion model, regularizing albedo and improving relighting. The method integrates with NeRF and DMTet representations via Score Distillation Sampling to optimize geometry, while physics-based rendering and a learned albedo prior improve texture realism. Empirical results show state-of-the-art geometry and textured-model generation, strong generalization, and favorable user-study rankings across prompts.

Abstract

Lifting 2D diffusion for 3D generation is a challenging problem due to the lack of geometric prior and the complex entanglement of materials and lighting in natural images. Existing methods have shown promise by first creating the geometry through score-distillation sampling (SDS) applied to rendered surface normals, followed by appearance modeling. However, relying on a 2D RGB diffusion model to optimize surface normals is suboptimal due to the distribution discrepancy between natural images and normals maps, leading to instability in optimization. In this paper, recognizing that the normal and depth information effectively describe scene geometry and be automatically estimated from images, we propose to learn a generalizable Normal-Depth diffusion model for 3D generation. We achieve this by training on the large-scale LAION dataset together with the generalizable image-to-depth and normal prior models. In an attempt to alleviate the mixed illumination effects in the generated materials, we introduce an albedo diffusion model to impose data-driven constraints on the albedo component. Our experiments show that when integrated into existing text-to-3D pipelines, our models significantly enhance the detail richness, achieving state-of-the-art results. Our project page is https://aigc3d.github.io/richdreamer/.
Paper Structure (56 sections, 10 equations, 25 figures, 2 tables)

This paper contains 56 sections, 10 equations, 25 figures, 2 tables.

Figures (25)

  • Figure 1: 3D Generation Results and Applications of RichDreamer. RichDreamer can generate highly-detailed and diverse 3D content from free-form user prompts. Our method achieves this by first generating the object geometry based on a generalizable Normal-Depth diffusion model, followed by modeling the physically-based rendering (PBR) materials. Notably, the diverse crocodile-theme objects at the bottom highlights the generalization ability of our method. The abbreviation of text prompts are shown beside the corresponding objects (full prompts can be found in the supplementary materials).
  • Figure 2: Overview of the proposed RichDreamer. We introduce a generalizable Normal-Depth diffusion model that is trained on the LAION-2B dataset with normal and depth predicted by Midas ranftl2020towards, followed by fine-tuning on the synthetic dataset. Our model can be incorporated with the DMTet and NeRF representations to enhance the geometry generation. To alleviate the ambiguity in appearance modeling, we propose an albedo diffusion model to impose data-drive prior on the albedo component.
  • Figure 2: Ablation study for geometry generation.
  • Figure 3: Visual comparison between our method and existing methods.
  • Figure 4: User study for text-to-3D.
  • ...and 20 more figures