Table of Contents
Fetching ...

InfoGaussian: Structure-Aware Dynamic Gaussians through Lightweight Information Shaping

Yunchao Zhang, Guandao Yang, Leonidas Guibas, Yanchao Yang

TL;DR

This paper tackles the challenge of editing complex 3D scenes represented by dense 3D Gaussian Splatting by capturing inter-Gaussian correlations. It introduces mutual information shaping through activation-level constraints in the attribute-decoding network, enabling coherent, object-level edits with a lightweight training pipeline that touches only a small subset of Gaussians. The approach combines a contrastive MI loss, 2D-to-3D mask labeling, and a smoothness regularizer to learn a well-structured tangent space that preserves correlations over successive edits. Empirically, it yields significant gains in 3D segmentation and editing tasks while maintaining low computational and memory overhead, demonstrating practical impact for open-world 3D editing and reconstruction.

Abstract

3D Gaussians, as a low-level scene representation, typically involve thousands to millions of Gaussians. This makes it difficult to control the scene in ways that reflect the underlying dynamic structure, where the number of independent entities is typically much smaller. In particular, it can be challenging to animate and move objects in the scene, which requires coordination among many Gaussians. To address this issue, we develop a mutual information shaping technique that enforces movement resonance between correlated Gaussians in a motion network. Such correlations can be learned from putative 2D object masks in different views. By approximating the mutual information with the Jacobians of the motions, our method ensures consistent movements of the Gaussians composing different objects under various perturbations. In particular, we develop an efficient contrastive training pipeline with lightweight optimization to shape the motion network, avoiding the need for re-shaping throughout the motion sequence. Notably, our training only touches a small fraction of all Gaussians in the scene yet attains the desired compositional behavior according to the underlying dynamic structure. The proposed technique is evaluated on challenging scenes and demonstrates significant performance improvement in promoting consistent movements and 3D object segmentation while inducing low computation and memory requirements.

InfoGaussian: Structure-Aware Dynamic Gaussians through Lightweight Information Shaping

TL;DR

This paper tackles the challenge of editing complex 3D scenes represented by dense 3D Gaussian Splatting by capturing inter-Gaussian correlations. It introduces mutual information shaping through activation-level constraints in the attribute-decoding network, enabling coherent, object-level edits with a lightweight training pipeline that touches only a small subset of Gaussians. The approach combines a contrastive MI loss, 2D-to-3D mask labeling, and a smoothness regularizer to learn a well-structured tangent space that preserves correlations over successive edits. Empirically, it yields significant gains in 3D segmentation and editing tasks while maintaining low computational and memory overhead, demonstrating practical impact for open-world 3D editing and reconstruction.

Abstract

3D Gaussians, as a low-level scene representation, typically involve thousands to millions of Gaussians. This makes it difficult to control the scene in ways that reflect the underlying dynamic structure, where the number of independent entities is typically much smaller. In particular, it can be challenging to animate and move objects in the scene, which requires coordination among many Gaussians. To address this issue, we develop a mutual information shaping technique that enforces movement resonance between correlated Gaussians in a motion network. Such correlations can be learned from putative 2D object masks in different views. By approximating the mutual information with the Jacobians of the motions, our method ensures consistent movements of the Gaussians composing different objects under various perturbations. In particular, we develop an efficient contrastive training pipeline with lightweight optimization to shape the motion network, avoiding the need for re-shaping throughout the motion sequence. Notably, our training only touches a small fraction of all Gaussians in the scene yet attains the desired compositional behavior according to the underlying dynamic structure. The proposed technique is evaluated on challenging scenes and demonstrates significant performance improvement in promoting consistent movements and 3D object segmentation while inducing low computation and memory requirements.
Paper Structure (24 sections, 19 equations, 14 figures, 2 tables)

This paper contains 24 sections, 19 equations, 14 figures, 2 tables.

Figures (14)

  • Figure 1: The proposed mutual information shaping of the attribute decoding network based on 3D Gaussian Splatting kerbl20233d can capture the underlying structure of the scene, while maintaining the correlations after consecutive parameter changes according to user-selected Gaussian (specified in Sec. \ref{['sec:edit']}). It promotes efficient scene editing by perturbing the network parameters, including re-colorization, segmentation, object removal, etc.
  • Figure 2: Perturb the attribute decoding network by the Jacobian of a Gaussian in the bulldozer without (a) or with (b) MI shaping Xu_2023_CVPR, and then move Gaussians according to similarities of Jacobian with selected one.
  • Figure 3: The training pipeline of our correlation shaping: (a) We use SAM kirillov2023segany to generate 2D masks and a pre-trained zero-shot tracker cheng2023tracking to associate masks from different views. We label the 3D mask of each Gaussian according to the 2D mask of the pixel that owns the Gaussian's maximal contribution during rendering across all views. (b) We use the labeled 3D masks as the supervision to conduct contrastive learning for mutual information shaping. After shaping, the Jacobians are consistently distributed in the tangent space.
  • Figure 4: Top: qualitative results of open vocabulary segmentation ye2023gaussian. Bottom: gallery of relevance maps when perturbing a single Gaussian (highlighted in the purple circle).
  • Figure 5: 3D object removal on bear & kitchen scenes. Compared to Gaussian Grouping, our method removes the object with a more fitting curve and less distortion of the irrelevant area.
  • ...and 9 more figures