EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Prior

Zhipeng Hu; Minda Zhao; Chaoyi Zhao; Xinyue Liang; Lincheng Li; Zeng Zhao; Changjie Fan; Xiaowei Zhou; Xin Yu

EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Prior

Zhipeng Hu, Minda Zhao, Chaoyi Zhao, Xinyue Liang, Lincheng Li, Zeng Zhao, Changjie Fan, Xiaowei Zhou, Xin Yu

TL;DR

EfficientDreamer tackles the Janus problem in text-to-3D by introducing an orthogonal-view diffusion prior that renders four mutually consistent orthogonal-view sub-images. A 3D synthesis fusion network then blends this prior with a pre-trained 2D diffusion prior under a dynamic Score Distillation Sampling strategy to progressively favor geometry and then texture. The method is trained on Objaverse-rendered composites and evaluated against state-of-the-art text-to-3D approaches, showing superior geometric consistency, photorealistic textures, and user-preferred results. The two-stage geometry-then-texture optimization, combined with ablation studies, demonstrates the value of orthogonal-view guidance for robust 3D content creation.

Abstract

While image diffusion models have made significant progress in text-driven 3D content creation, they often fail to accurately capture the intended meaning of text prompts, especially for view information. This limitation leads to the Janus problem, where multi-faced 3D models are generated under the guidance of such diffusion models. In this paper, we propose a robust high-quality 3D content generation pipeline by exploiting orthogonal-view image guidance. First, we introduce a novel 2D diffusion model that generates an image consisting of four orthogonal-view sub-images based on the given text prompt. Then, the 3D content is created using this diffusion model. Notably, the generated orthogonal-view image provides strong geometric structure priors and thus improves 3D consistency. As a result, it effectively resolves the Janus problem and significantly enhances the quality of 3D content creation. Additionally, we present a 3D synthesis fusion network that can further improve the details of the generated 3D contents. Both quantitative and qualitative evaluations demonstrate that our method surpasses previous text-to-3D techniques. Project page: https://efficientdreamer.github.io.

EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Prior

TL;DR

Abstract

Paper Structure (16 sections, 4 equations, 8 figures, 3 tables)

This paper contains 16 sections, 4 equations, 8 figures, 3 tables.

Introduction
Related Work
3D Reconstruction with Neural Fields
Text-to-image Generation
Text-to-3D Generation
Methodology
Orthogonal-view Diffusion Model
Text-to-3D via the 3D Synthesis Fusion Network
Implementation Details
Experiments
Comparison with State-of-the-arts
Comparison with Perp-Neg
Coarse-to-fine Two-stage Optimization Results
User Study
Ablation Study
...and 1 more sections

Figures (8)

Figure 1: The given prompt is A pig wearing a backpack, high quality.Left: The pre-trained Stable Diffusion model struggles to accurately generate images based on specific view instructions, thereby encountering the Janus problem in the text-driven 3D generation. Right:EffcientDreamer leverages our newly introduced orthogonal-view diffusion model, enabling the generation of 3D consistent images depicting the same scene from multiple orthogonal viewpoints.
Figure 2: The overview of EfficientDreamer involves two key steps. Firstly, we train an orthogonal-view diffusion model by rendering images from the Objaverse dataset. Secondly, we optimize the 3D scene representation by leveraging both the newly introduced orthogonal-view diffusion model and the pre-trained text-to-image diffusion model. To ensure high-fidelity and robust 3D creation, we employ a dynamic 3D synthesis strategy.
Figure 3: Comparison between The orthogonal-view diffusion model and the pre-trained Stable Diffusion model with additional viewpoint instructions. The pre-trained Stable Diffusion model struggles to generate images based on specific view instructions, while the orthogonal-view diffusion model can generate composite images from orthogonal views.
Figure 4: Comparison with other text-to-3D methods. We render each 3D model from two views. Our method outperforms other techniques by generating more high-fidelity 3D models without encountering the Janus problem.
Figure 5: Comparison between the Perp-Neg method and our approach: Our method offers a more comprehensive solution to the Janus problem compared to Perp-Neg.
...and 3 more figures

EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Prior

TL;DR

Abstract

EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Prior

Authors

TL;DR

Abstract

Table of Contents

Figures (8)