Table of Contents
Fetching ...

UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation

Zexiang Liu, Yangguang Li, Youtian Lin, Xin Yu, Sida Peng, Yan-Pei Cao, Xiaojuan Qi, Xiaoshui Huang, Ding Liang, Wanli Ouyang

TL;DR

<3-5 sentence high-level summary> UniDream tackles the challenge of relightable text-to-3D generation by disentangling lighting from texture through an albedo-normal aligned multi-view diffusion model, a Transformer-based reconstruction module, and SDS-based refinement. The framework progressively builds a coherent 3D prior and finally learns PBR materials using a Stable Diffusion-based renderer, enabling robust relighting under varied illumination. Key contributions include the AN-MVM diffusion with multi-view/multi-domain attention, TRM for geometry priors, SDS-driven refinement, and BRDF parameter learning for relightable PBR. Empirical results show clear improvements in albedo fidelity, surface smoothness, relighting realism, and alignment with textual prompts compared with prior text-to-3D methods.»

Abstract

Recent advancements in text-to-3D generation technology have significantly advanced the conversion of textual descriptions into imaginative well-geometrical and finely textured 3D objects. Despite these developments, a prevalent limitation arises from the use of RGB data in diffusion or reconstruction models, which often results in models with inherent lighting and shadows effects that detract from their realism, thereby limiting their usability in applications that demand accurate relighting capabilities. To bridge this gap, we present UniDream, a text-to-3D generation framework by incorporating unified diffusion priors. Our approach consists of three main components: (1) a dual-phase training process to get albedo-normal aligned multi-view diffusion and reconstruction models, (2) a progressive generation procedure for geometry and albedo-textures based on Score Distillation Sample (SDS) using the trained reconstruction and diffusion models, and (3) an innovative application of SDS for finalizing PBR generation while keeping a fixed albedo based on Stable Diffusion model. Extensive evaluations demonstrate that UniDream surpasses existing methods in generating 3D objects with clearer albedo textures, smoother surfaces, enhanced realism, and superior relighting capabilities.

UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation

TL;DR

<3-5 sentence high-level summary> UniDream tackles the challenge of relightable text-to-3D generation by disentangling lighting from texture through an albedo-normal aligned multi-view diffusion model, a Transformer-based reconstruction module, and SDS-based refinement. The framework progressively builds a coherent 3D prior and finally learns PBR materials using a Stable Diffusion-based renderer, enabling robust relighting under varied illumination. Key contributions include the AN-MVM diffusion with multi-view/multi-domain attention, TRM for geometry priors, SDS-driven refinement, and BRDF parameter learning for relightable PBR. Empirical results show clear improvements in albedo fidelity, surface smoothness, relighting realism, and alignment with textual prompts compared with prior text-to-3D methods.»

Abstract

Recent advancements in text-to-3D generation technology have significantly advanced the conversion of textual descriptions into imaginative well-geometrical and finely textured 3D objects. Despite these developments, a prevalent limitation arises from the use of RGB data in diffusion or reconstruction models, which often results in models with inherent lighting and shadows effects that detract from their realism, thereby limiting their usability in applications that demand accurate relighting capabilities. To bridge this gap, we present UniDream, a text-to-3D generation framework by incorporating unified diffusion priors. Our approach consists of three main components: (1) a dual-phase training process to get albedo-normal aligned multi-view diffusion and reconstruction models, (2) a progressive generation procedure for geometry and albedo-textures based on Score Distillation Sample (SDS) using the trained reconstruction and diffusion models, and (3) an innovative application of SDS for finalizing PBR generation while keeping a fixed albedo based on Stable Diffusion model. Extensive evaluations demonstrate that UniDream surpasses existing methods in generating 3D objects with clearer albedo textures, smoother surfaces, enhanced realism, and superior relighting capabilities.
Paper Structure (15 sections, 3 equations, 8 figures, 1 table)

This paper contains 15 sections, 3 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Comparison with baselines. UniDream presents clear albedo textures, completely smooth surfaces, and advanced relighting capabilities. The 'Albedo' column demonstrates the albedo and normal properties of the 3D objects generated using our method. Meanwhile, the 'Relighting-I' and 'Relighting-II' columns demonstrate the effect of relighting on the generated PBR materials under white and purple lighting conditions, respectively.
  • Figure 2: Comparison of UniDream with other methods. (a) The existing RGB-based text-to-3D generation process; (b) UniDream's multi-stage generation process.
  • Figure 3: Overview of UniDream. Left: the multi-view diffusion model generates multi-view images based on input text. Middle: first, four view albedo maps obtain 3D prior by the reconstruction model, and then the multi-view diffusion model performs SDS optimization based on the 3D prior to generate a 3D object with albedo texture. Right: using Stable Diffusion model to generate PBR material.
  • Figure 4: Illustrative overview of our method’s capabilities. We demonstrate the performance of 3D objects generated by our method in three dimensions: albedo, PBR, and normal.
  • Figure 5: Comparison of multi-view results generated by MVDream (Row-1) and UniDream (Row-2-3).
  • ...and 3 more figures