Interpretable Single-View 3D Gaussian Splatting using Unsupervised Hierarchical Disentangled Representation Learning
Yuyang Zhang, Baao Xie, Hu Zhu, Qi Wang, Huanting Guo, Xin Jin, Wenjun Zeng
TL;DR
3DisGS tackles the interpretability gap in single-view 3D Gaussian Splatting by introducing a hierarchical DRL framework that discovers coarse- and fine-grained 3D semantics without supervision. It employs a dual-branch reconstruction (geometry via point clouds and appearance via triplane Gaussians) and DRL-based encoder-adapters to create orthogonal latent factors, aided by mutual information and style-guided modules to ensure view-consistent reconstructions. Experiments on ShapeNet and CO3D demonstrate effective 3D disentanglement with competitive reconstruction quality and efficiency, enabling semantic edits at both geometry and appearance levels. This work paves the way for controllable, semantically-aware 3D reconstructions from a single view, with potential extensions to environmental effects modeling.
Abstract
Gaussian Splatting (GS) has recently marked a significant advancement in 3D reconstruction, delivering both rapid rendering and high-quality results. However, existing 3DGS methods pose challenges in understanding underlying 3D semantics, which hinders model controllability and interpretability. To address it, we propose an interpretable single-view 3DGS framework, termed 3DisGS, to discover both coarse- and fine-grained 3D semantics via hierarchical disentangled representation learning (DRL). Specifically, the model employs a dual-branch architecture, consisting of a point cloud initialization branch and a triplane-Gaussian generation branch, to achieve coarse-grained disentanglement by separating 3D geometry and visual appearance features. Subsequently, fine-grained semantic representations within each modality are further discovered through DRL-based encoder-adapters. To our knowledge, this is the first work to achieve unsupervised interpretable 3DGS. Evaluations indicate that our model achieves 3D disentanglement while preserving high-quality and rapid reconstruction.
