Table of Contents
Fetching ...

ART3D: 3D Gaussian Splatting for Text-Guided Artistic Scenes Generation

Pengzhi Li, Chengshuai Tang, Qinxuan Huang, Zhiheng Li

TL;DR

ART3D tackles 3D artistic scene generation from text by combining diffusion-based image synthesis with 3D Gaussian splatting. It introduces an image semantic transfer module to align artistic inputs with realistic depth cues, builds a view-consistent point cloud map, and uses a depth consistency module to ensure cross-view coherence. The final 3D scenes are rendered via 3D Gaussian splatting, trained with supervision from projected views and ignoring unreliable inpainted regions. Quantitative and qualitative experiments show superior content and structural consistency over baselines, highlighting its potential for high-quality AI-assisted art.

Abstract

In this paper, we explore the existing challenges in 3D artistic scene generation by introducing ART3D, a novel framework that combines diffusion models and 3D Gaussian splatting techniques. Our method effectively bridges the gap between artistic and realistic images through an innovative image semantic transfer algorithm. By leveraging depth information and an initial artistic image, we generate a point cloud map, addressing domain differences. Additionally, we propose a depth consistency module to enhance 3D scene consistency. Finally, the 3D scene serves as initial points for optimizing Gaussian splats. Experimental results demonstrate ART3D's superior performance in both content and structural consistency metrics when compared to existing methods. ART3D significantly advances the field of AI in art creation by providing an innovative solution for generating high-quality 3D artistic scenes.

ART3D: 3D Gaussian Splatting for Text-Guided Artistic Scenes Generation

TL;DR

ART3D tackles 3D artistic scene generation from text by combining diffusion-based image synthesis with 3D Gaussian splatting. It introduces an image semantic transfer module to align artistic inputs with realistic depth cues, builds a view-consistent point cloud map, and uses a depth consistency module to ensure cross-view coherence. The final 3D scenes are rendered via 3D Gaussian splatting, trained with supervision from projected views and ignoring unreliable inpainted regions. Quantitative and qualitative experiments show superior content and structural consistency over baselines, highlighting its potential for high-quality AI-assisted art.

Abstract

In this paper, we explore the existing challenges in 3D artistic scene generation by introducing ART3D, a novel framework that combines diffusion models and 3D Gaussian splatting techniques. Our method effectively bridges the gap between artistic and realistic images through an innovative image semantic transfer algorithm. By leveraging depth information and an initial artistic image, we generate a point cloud map, addressing domain differences. Additionally, we propose a depth consistency module to enhance 3D scene consistency. Finally, the 3D scene serves as initial points for optimizing Gaussian splats. Experimental results demonstrate ART3D's superior performance in both content and structural consistency metrics when compared to existing methods. ART3D significantly advances the field of AI in art creation by providing an innovative solution for generating high-quality 3D artistic scenes.
Paper Structure (19 sections, 7 equations, 6 figures, 2 tables)

This paper contains 19 sections, 7 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Pipeline of ART3D. We introduce the process of our method. First, we input textual descriptions or reference images and use semantic transfer algorithms to obtain accurate depth 3D information. Then, construct a point cloud and enhance multi-view consistency through consistency/alignment algorithms. Finally, we render high-quality 3D artistic scenes using 3D Gaussian splatting technology.
  • Figure 2: We demonstrate the effectiveness of our depth consistency module. Frames from (a) to (b) represent consecutive frames, while (c) and (d) illustrate depth value slices evolving over time at the red line in (a) and (b). A smoother x-t slice indicates more consistent depth. Our approach significantly enhances depth consistency.
  • Figure 3: Visualization results of our method. (a) represents the inputs, where we can use only text or a combination of a reference image and text as input. (b) is a novel view image rendered from the generated 3D artistic scene, showing stylistic consistency. (c) demonstrates 3D artistic scenes generated by our method through a predefined camera trajectory. Our approach can accurately and high-quality generate structurally consistent and diverse 3D artistic scenes.
  • Figure 4: Qualitative comparison with 3D scene generation methods. LucidDreamer chung2023luciddreamer and Text2Room hollein2023text2room perform poorly on artistic images due to the gap between the artistic and realistic domains. As highlighted in the red circle, they struggle to obtain accurate 3D information, leading to structural errors caused by depth accuracy or alignment. These issues result in the generation of unsatisfactory 3D scenes. In contrast, our method excels in the artistic domain.
  • Figure 5: The depth map directly estimated from the artistic image (a) lacks numerous details. In (b), our image semantic transfer algorithm can generate a more accurate depth map. This addresses the challenge of obtaining precise 3D information to generate 3D artistic scenes.
  • ...and 1 more figures