Table of Contents
Fetching ...

GaussianBlock: Building Part-Aware Compositional and Editable 3D Scene by Primitives and Gaussians

Shuyi Jiang, Qihao Zhao, Hossein Rahmani, De Wen Soh, Jun Liu, Na Zhao

TL;DR

GaussianBlock addresses entangled latent representations in neural 3D reconstruction by introducing a semantically aware hybrid representation that couples flexibly editable superquadric primitives with high-fidelity 3D Gaussians. A novel Attention-guided Centering loss derived from 2D priors enforces semantic disentanglement of primitives, while dynamic splitting/fusion and a binding inheritance strategy maintain a tight connection between Gaussians and their associated primitives. The optimization proceeds in two stages: first stage minimizes $L_{first} = L_{rec} + γ L_{AC}$ to refine primitives, and second stage minimizes $L_{second} = L_{rgb} + L_{pos}$ to bind and refine Gaussians, enabling precise editing without sacrificing fidelity. Empirical results on DTU, Nerfstudio, BlendedMVS, and related benchmarks demonstrate state-of-the-art part-level decomposition and competitive fidelity with direct editability of components, supporting practical, building-block style 3D editing.

Abstract

Recently, with the development of Neural Radiance Fields and Gaussian Splatting, 3D reconstruction techniques have achieved remarkably high fidelity. However, the latent representations learnt by these methods are highly entangled and lack interpretability. In this paper, we propose a novel part-aware compositional reconstruction method, called GaussianBlock, that enables semantically coherent and disentangled representations, allowing for precise and physical editing akin to building blocks, while simultaneously maintaining high fidelity. Our GaussianBlock introduces a hybrid representation that leverages the advantages of both primitives, known for their flexible actionability and editability, and 3D Gaussians, which excel in reconstruction quality. Specifically, we achieve semantically coherent primitives through a novel attention-guided centering loss derived from 2D semantic priors, complemented by a dynamic splitting and fusion strategy. Furthermore, we utilize 3D Gaussians that hybridize with primitives to refine structural details and enhance fidelity. Additionally, a binding inheritance strategy is employed to strengthen and maintain the connection between the two. Our reconstructed scenes are evidenced to be disentangled, compositional, and compact across diverse benchmarks, enabling seamless, direct and precise editing while maintaining high quality.

GaussianBlock: Building Part-Aware Compositional and Editable 3D Scene by Primitives and Gaussians

TL;DR

GaussianBlock addresses entangled latent representations in neural 3D reconstruction by introducing a semantically aware hybrid representation that couples flexibly editable superquadric primitives with high-fidelity 3D Gaussians. A novel Attention-guided Centering loss derived from 2D priors enforces semantic disentanglement of primitives, while dynamic splitting/fusion and a binding inheritance strategy maintain a tight connection between Gaussians and their associated primitives. The optimization proceeds in two stages: first stage minimizes to refine primitives, and second stage minimizes to bind and refine Gaussians, enabling precise editing without sacrificing fidelity. Empirical results on DTU, Nerfstudio, BlendedMVS, and related benchmarks demonstrate state-of-the-art part-level decomposition and competitive fidelity with direct editability of components, supporting practical, building-block style 3D editing.

Abstract

Recently, with the development of Neural Radiance Fields and Gaussian Splatting, 3D reconstruction techniques have achieved remarkably high fidelity. However, the latent representations learnt by these methods are highly entangled and lack interpretability. In this paper, we propose a novel part-aware compositional reconstruction method, called GaussianBlock, that enables semantically coherent and disentangled representations, allowing for precise and physical editing akin to building blocks, while simultaneously maintaining high fidelity. Our GaussianBlock introduces a hybrid representation that leverages the advantages of both primitives, known for their flexible actionability and editability, and 3D Gaussians, which excel in reconstruction quality. Specifically, we achieve semantically coherent primitives through a novel attention-guided centering loss derived from 2D semantic priors, complemented by a dynamic splitting and fusion strategy. Furthermore, we utilize 3D Gaussians that hybridize with primitives to refine structural details and enhance fidelity. Additionally, a binding inheritance strategy is employed to strengthen and maintain the connection between the two. Our reconstructed scenes are evidenced to be disentangled, compositional, and compact across diverse benchmarks, enabling seamless, direct and precise editing while maintaining high quality.
Paper Structure (19 sections, 11 equations, 9 figures, 4 tables, 3 algorithms)

This paper contains 19 sections, 11 equations, 9 figures, 4 tables, 3 algorithms.

Figures (9)

  • Figure 1: While most related works face at least one of the three common limitations, including fidelity, editability, and semantically coherent part-aware disentanglement. Our method addresses all three limitations simultaneously. The underline text and masks are semtanic tracing keywords and results. Best viewed with color and marks.
  • Figure 2: The framework of our pipeline. In the first stage, the superquadrics blocks are optimized guided by reconstruction loss $\mathcal{L}_{rec}$ and attention-guided centering loss $\mathcal{L}_{AC}$. With the point $P_k$ and bounding box $B_k$ prompt obtained from soft dual rasterization, the attention maps $A_k$ from the last layer of pretrained decoder D are clustered, where outliers being encouraged to move towards the centroid. Meanwhile, superquadric splitting and fusion strategy is proposed to further enhance the semantic coherence and compactness during optimization. In the second stage, Gaussians are bound to the triangles of superquadrics using localized parameterization and inheritance strategy duing optimization and densification with global mapping transform.
  • Figure 3: Qualitative results. Our method demonstrates fine-grained, semantically coherent part-aware decomposition.
  • Figure 4: Editing Results. Our reconstructed scenes can be edited seamlessly and precisely.
  • Figure 5: Ablation studies on (a): Experiment settings, (b) AC-Loss weight $\gamma$, (c): Splitting threshold $\beta$; (d): Regularization hyper-parameter $\epsilon_{pos}$
  • ...and 4 more figures