Table of Contents
Fetching ...

CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians

Chongjian Ge, Chenfeng Xu, Yuanfeng Ji, Chensheng Peng, Masayoshi Tomizuka, Ping Luo, Mingyu Ding, Varun Jampani, Wei Zhan

TL;DR

COMPGS is a novel generative framework that employs 3D Gaussian Splatting for efficient, compositional text-to-3D content generation and optimizes across objects of varying scales by dynamically adjusting the spatial parameters of each entity, enhancing the generation of fine-grained details, particularly in smaller entities.

Abstract

Recent breakthroughs in text-guided image generation have significantly advanced the field of 3D generation. While generating a single high-quality 3D object is now feasible, generating multiple objects with reasonable interactions within a 3D space, a.k.a. compositional 3D generation, presents substantial challenges. This paper introduces CompGS, a novel generative framework that employs 3D Gaussian Splatting (GS) for efficient, compositional text-to-3D content generation. To achieve this goal, two core designs are proposed: (1) 3D Gaussians Initialization with 2D compositionality: We transfer the well-established 2D compositionality to initialize the Gaussian parameters on an entity-by-entity basis, ensuring both consistent 3D priors for each entity and reasonable interactions among multiple entities; (2) Dynamic Optimization: We propose a dynamic strategy to optimize 3D Gaussians using Score Distillation Sampling (SDS) loss. CompGS first automatically decomposes 3D Gaussians into distinct entity parts, enabling optimization at both the entity and composition levels. Additionally, CompGS optimizes across objects of varying scales by dynamically adjusting the spatial parameters of each entity, enhancing the generation of fine-grained details, particularly in smaller entities. Qualitative comparisons and quantitative evaluations on T3Bench demonstrate the effectiveness of CompGS in generating compositional 3D objects with superior image quality and semantic alignment over existing methods. CompGS can also be easily extended to controllable 3D editing, facilitating scene generation. We hope CompGS will provide new insights to the compositional 3D generation. Project page: https://chongjiange.github.io/compgs.html.

CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians

TL;DR

COMPGS is a novel generative framework that employs 3D Gaussian Splatting for efficient, compositional text-to-3D content generation and optimizes across objects of varying scales by dynamically adjusting the spatial parameters of each entity, enhancing the generation of fine-grained details, particularly in smaller entities.

Abstract

Recent breakthroughs in text-guided image generation have significantly advanced the field of 3D generation. While generating a single high-quality 3D object is now feasible, generating multiple objects with reasonable interactions within a 3D space, a.k.a. compositional 3D generation, presents substantial challenges. This paper introduces CompGS, a novel generative framework that employs 3D Gaussian Splatting (GS) for efficient, compositional text-to-3D content generation. To achieve this goal, two core designs are proposed: (1) 3D Gaussians Initialization with 2D compositionality: We transfer the well-established 2D compositionality to initialize the Gaussian parameters on an entity-by-entity basis, ensuring both consistent 3D priors for each entity and reasonable interactions among multiple entities; (2) Dynamic Optimization: We propose a dynamic strategy to optimize 3D Gaussians using Score Distillation Sampling (SDS) loss. CompGS first automatically decomposes 3D Gaussians into distinct entity parts, enabling optimization at both the entity and composition levels. Additionally, CompGS optimizes across objects of varying scales by dynamically adjusting the spatial parameters of each entity, enhancing the generation of fine-grained details, particularly in smaller entities. Qualitative comparisons and quantitative evaluations on T3Bench demonstrate the effectiveness of CompGS in generating compositional 3D objects with superior image quality and semantic alignment over existing methods. CompGS can also be easily extended to controllable 3D editing, facilitating scene generation. We hope CompGS will provide new insights to the compositional 3D generation. Project page: https://chongjiange.github.io/compgs.html.

Paper Structure

This paper contains 23 sections, 3 equations, 14 figures, 5 tables, 1 algorithm.

Figures (14)

  • Figure 1: Illustration of compositional 3D Generation and CompGS. All the contents are generated by CompGS. Top row:CompGS is capable of generating either a single object (e.g., a butterfly) or generating compositional objects with reasonable interactions (e.g., the rightmost figure in the top row). Middle row: Beyond text-to-3D generation, CompGS can be easily extend to 3D editing by progressively adding objects. The colored texts (e.g., 'a branch', 'a pinecone', 'a rat' in the rightmost figure) denote the added part compared to its previous asset. Bottom row:CompGS achieves compositional text-to-3D by transferring 2D compositionality to initialize 3D Gaussians. CompGS is further trained with dynamic SDS optimization to produce plausible results.
  • Figure 2: Overall pipeline of CompGS. Given a compositional prompt $V$, we first use an LLM to decompose it into entity-level prompts $\{v_l\}$, guiding the segmentation of each entity from the compositional image generated by T2I models. The segmented images initialize entity-level 3D Gaussians via image-to-3D models triposr_arxiv24_tochilkinlrm_arxiv23_hong. CompGS employs a dynamic optimization strategy, alternating between composition-level optimization of $\theta$ and entity-level optimization of $\{\theta_l\}$. For entity-level optimization, COMPGS dynamically maintains volume consistency to refine the details of each objects, particularly the small one.
  • Figure 3: Qualitative comparisons between CompGS and other text-to-3D models on $\mathrm{T}^3$Bench (multiple objects track). Compared to others, CompGS is better at generating highly-composed, high-quality 3D contents that strictly align with the given texts. Watch the animations by clicking them (Not all PDF readers support playing animations. Best viewed in Acrobat/Foxit Reader).
  • Figure 4: More generated samples by CompGS. Four views are shown. CompGS can generate high-quality contents with reasonable interactions given two, three or more entities.
  • Figure 5: 3D Editing examples of CompGS. More examples could be found in Appendix \ref{['sec:3d_editing']}.
  • ...and 9 more figures