Table of Contents
Fetching ...

EucliDreamer: Fast and High-Quality Texturing for 3D Models with Stable Diffusion Depth

Cindy Le, Congrui Hetang, Chendi Lin, Ang Cao, Yihui He

TL;DR

EucliDreamer tackles the challenge of texturing 3D meshes from textual prompts by introducing depth-conditioned diffusion via Stable Diffusion depth into the Score Distillation Sampling loop. The approach uses a differentiable renderer and a hash-grid texture representation to iteratively refine textures, achieving higher quality and faster convergence than prior SDS-based methods. Through extensive ablations, a user study, and Objaverse benchmarking, the work demonstrates improved realism, diverse artistic styles, and reduced inference time, while highlighting limitations such as dependence on visible surfaces and lighting handling. The results indicate depth conditioning is a key factor for practical, high-quality diffusion-based 3D texturing with potential for broader adoption and future extensions to scenes and animation.

Abstract

This paper presents a novel method to generate textures for 3D models given text prompts and 3D meshes. Additional depth information is taken into account to perform the Score Distillation Sampling (SDS) process with depth conditional Stable Diffusion. We ran our model over the open-source dataset Objaverse and conducted a user study to compare the results with those of various 3D texturing methods. We have shown that our model can generate more satisfactory results and produce various art styles for the same object. In addition, we achieved faster time when generating textures of comparable quality. We also conduct thorough ablation studies of how different factors may affect generation quality, including sampling steps, guidance scale, negative prompts, data augmentation, elevation range, and alternatives to SDS.

EucliDreamer: Fast and High-Quality Texturing for 3D Models with Stable Diffusion Depth

TL;DR

EucliDreamer tackles the challenge of texturing 3D meshes from textual prompts by introducing depth-conditioned diffusion via Stable Diffusion depth into the Score Distillation Sampling loop. The approach uses a differentiable renderer and a hash-grid texture representation to iteratively refine textures, achieving higher quality and faster convergence than prior SDS-based methods. Through extensive ablations, a user study, and Objaverse benchmarking, the work demonstrates improved realism, diverse artistic styles, and reduced inference time, while highlighting limitations such as dependence on visible surfaces and lighting handling. The results indicate depth conditioning is a key factor for practical, high-quality diffusion-based 3D texturing with potential for broader adoption and future extensions to scenes and animation.

Abstract

This paper presents a novel method to generate textures for 3D models given text prompts and 3D meshes. Additional depth information is taken into account to perform the Score Distillation Sampling (SDS) process with depth conditional Stable Diffusion. We ran our model over the open-source dataset Objaverse and conducted a user study to compare the results with those of various 3D texturing methods. We have shown that our model can generate more satisfactory results and produce various art styles for the same object. In addition, we achieved faster time when generating textures of comparable quality. We also conduct thorough ablation studies of how different factors may affect generation quality, including sampling steps, guidance scale, negative prompts, data augmentation, elevation range, and alternatives to SDS.
Paper Structure (39 sections, 16 figures, 1 table)

This paper contains 39 sections, 16 figures, 1 table.

Figures (16)

  • Figure 1: A showcase of objects textured by EucliDreamer.
  • Figure 2: Problematic textures that we generated using existing Diffusion-based methods. (a) The flower model has blue leaves and flower patterns over the flower buds due to an incorrect understanding of the model. (b) The house has different colors and styles for each wall. (c) The sign has shadows on the texture, which should only result from rendering. (d) The cart has an oversaturated color of bright green and does not look pleasant.
  • Figure 3: Our method. A render will transform the 3D model into a 2D view and extract layers of information. Then the SDS loss will loop back to update a hash grid. An additional layer of Stable Diffusion depth is added along with the RGB color layer.
  • Figure 4: Comparisons with previous texturing methods. Four objects are used for illustration from Objaverse deitke2022objaverse. The rendering performance of the first three methods, CLIPMesh, Latent-Paint, and Text2Tex are discussed in chen2023text2tex. Overall, the examples demonstrate a clear win of Text2Tex chen2023text2tex and our method against the baselines methods Mohammad_Khalid_2022metzer2022latentnerf in terms of clarity and level of detail.
  • Figure 5: Objects textured by EucliDreamer in various styles, by different input text prompts.
  • ...and 11 more figures