Table of Contents
Fetching ...

3DBonsai: Structure-Aware Bonsai Modeling Using Conditioned 3D Gaussian Splatting

Hao Wu, Hao Wang, Ruochong Li, Xuran Ma, Hui Xiong

TL;DR

3DBonsai tackles the problem of generating complex 3D bonsai by introducing a trainable 3D Space Colonization Algorithm to create structure priors and two structure-aware 3D Gaussian Splatting pipelines (fine-structure and coarse-structure) guided by 2D diffusion. The method couples 3D priors with SDS-based text guidance and depth-informed multi-view consistency, enabling high-fidelity, structure-aware 3D bonsai generation. A Chinese-style bonsai dataset supports training and benchmarking, and extensive experiments show improvements in both perceptual quality and 3D consistency, with favorable CLIP scores and human preferences. The work establishes a new benchmark for structure-aware 3D bonsai generation and highlights remaining challenges in rendering extremely complex forms due to view-limited diffusion guidance.

Abstract

Recent advancements in text-to-3D generation have shown remarkable results by leveraging 3D priors in combination with 2D diffusion. However, previous methods utilize 3D priors that lack detailed and complex structural information, limiting them to generating simple objects and presenting challenges for creating intricate structures such as bonsai. In this paper, we propose 3DBonsai, a novel text-to-3D framework for generating 3D bonsai with complex structures. Technically, we first design a trainable 3D space colonization algorithm to produce bonsai structures, which are then enhanced through random sampling and point cloud augmentation to serve as the 3D Gaussian priors. We introduce two bonsai generation pipelines with distinct structural levels: fine structure conditioned generation, which initializes 3D Gaussians using a 3D structure prior to produce detailed and complex bonsai, and coarse structure conditioned generation, which employs a multi-view structure consistency module to align 2D and 3D structures. Moreover, we have compiled a unified 2D and 3D Chinese-style bonsai dataset. Our experimental results demonstrate that 3DBonsai significantly outperforms existing methods, providing a new benchmark for structure-aware 3D bonsai generation.

3DBonsai: Structure-Aware Bonsai Modeling Using Conditioned 3D Gaussian Splatting

TL;DR

3DBonsai tackles the problem of generating complex 3D bonsai by introducing a trainable 3D Space Colonization Algorithm to create structure priors and two structure-aware 3D Gaussian Splatting pipelines (fine-structure and coarse-structure) guided by 2D diffusion. The method couples 3D priors with SDS-based text guidance and depth-informed multi-view consistency, enabling high-fidelity, structure-aware 3D bonsai generation. A Chinese-style bonsai dataset supports training and benchmarking, and extensive experiments show improvements in both perceptual quality and 3D consistency, with favorable CLIP scores and human preferences. The work establishes a new benchmark for structure-aware 3D bonsai generation and highlights remaining challenges in rendering extremely complex forms due to view-limited diffusion guidance.

Abstract

Recent advancements in text-to-3D generation have shown remarkable results by leveraging 3D priors in combination with 2D diffusion. However, previous methods utilize 3D priors that lack detailed and complex structural information, limiting them to generating simple objects and presenting challenges for creating intricate structures such as bonsai. In this paper, we propose 3DBonsai, a novel text-to-3D framework for generating 3D bonsai with complex structures. Technically, we first design a trainable 3D space colonization algorithm to produce bonsai structures, which are then enhanced through random sampling and point cloud augmentation to serve as the 3D Gaussian priors. We introduce two bonsai generation pipelines with distinct structural levels: fine structure conditioned generation, which initializes 3D Gaussians using a 3D structure prior to produce detailed and complex bonsai, and coarse structure conditioned generation, which employs a multi-view structure consistency module to align 2D and 3D structures. Moreover, we have compiled a unified 2D and 3D Chinese-style bonsai dataset. Our experimental results demonstrate that 3DBonsai significantly outperforms existing methods, providing a new benchmark for structure-aware 3D bonsai generation.

Paper Structure

This paper contains 16 sections, 8 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: 3DBonsai Framework. The 3D SCA module leverages our 2D Bonsai dataset to generate 3D structure priors. In structure-aware 3D Gaussian splatting, we implement two pipelines: (a) the fine structure (blue lines) 3DGS uses the 3D structure from the 3D SCA as initialization to create 3D bonsai that closely align with the structure structure. (b) the coarse structure (orange lines) 3DGS starts with the text-to-3D diffusion result as the initial state, followed by a 2D-3D structure consistency module to ensure 3D consistency by perceptually aligning the 3D structure prior.
  • Figure 2: Qualitative comparisons (a) Qualitative comparisons for image-to-3D between our 3DBonsai(coarse structure) and DreamFusion poole2022dreamfusion, Magic3D lin2023magic3d, Fantasia3D chen2023fantasia3d, ProlificDreamer wang2024prolificdreamer, GaussianDreamer yi2023-gaussiandreamer, DreamGaussian tang2023-dreamgaussian. The comparison results include the two best baselines and our generation results through fine and coarse structure pipelines; (b) Qualitative comparisons between two pipelines of 3DBonsai and GaussianDreamer yi2023-gaussiandreamer, DreamGaussian tang2023-dreamgaussian.
  • Figure 3: Ablation study of the various complexity of structures. "L" denotes segment length, "S" represents the radius of the initial point cloud range, and "D" indicates the initial point cloud density.