Table of Contents
Fetching ...

Binary-Gaussian: Compact and Progressive Representation for 3D Gaussian Segmentation

An Yang, Chenyu Liu, Jun Du, Jianqing Gao, Jia Pan, Jinshui Hu, Baocai Yin, Bing Yin, Cong Liu

TL;DR

The paper tackles memory-heavy semantic segmentation in 3D Gaussian Splatting by replacing per-Gaussian continuous category features with compact binary codes in a coarse-to-fine, multi-granularity framework. A progressive contrastive learning scheme aligns features across granularity levels, while a virtual-negative mechanism and opacity tuning address representation and rendering-semantic gaps. Additional strategies like mask-balanced sampling further boost fine-grained segmentation quality. Empirical results demonstrate state-of-the-art performance with dramatically reduced memory usage and fast inference across real-world datasets. The approach enables precise, scalable 3D panoptic segmentation suitable for real-time applications.

Abstract

3D Gaussian Splatting (3D-GS) has emerged as an efficient 3D representation and a promising foundation for semantic tasks like segmentation. However, existing 3D-GS-based segmentation methods typically rely on high-dimensional category features, which introduce substantial memory overhead. Moreover, fine-grained segmentation remains challenging due to label space congestion and the lack of stable multi-granularity control mechanisms. To address these limitations, we propose a coarse-to-fine binary encoding scheme for per-Gaussian category representation, which compresses each feature into a single integer via the binary-to-decimal mapping, drastically reducing memory usage. We further design a progressive training strategy that decomposes panoptic segmentation into a series of independent sub-tasks, reducing inter-class conflicts and thereby enhancing fine-grained segmentation capability. Additionally, we fine-tune opacity during segmentation training to address the incompatibility between photometric rendering and semantic segmentation, which often leads to foreground-background confusion. Extensive experiments on multiple benchmarks demonstrate that our method achieves state-of-the-art segmentation performance while significantly reducing memory consumption and accelerating inference.

Binary-Gaussian: Compact and Progressive Representation for 3D Gaussian Segmentation

TL;DR

The paper tackles memory-heavy semantic segmentation in 3D Gaussian Splatting by replacing per-Gaussian continuous category features with compact binary codes in a coarse-to-fine, multi-granularity framework. A progressive contrastive learning scheme aligns features across granularity levels, while a virtual-negative mechanism and opacity tuning address representation and rendering-semantic gaps. Additional strategies like mask-balanced sampling further boost fine-grained segmentation quality. Empirical results demonstrate state-of-the-art performance with dramatically reduced memory usage and fast inference across real-world datasets. The approach enables precise, scalable 3D panoptic segmentation suitable for real-time applications.

Abstract

3D Gaussian Splatting (3D-GS) has emerged as an efficient 3D representation and a promising foundation for semantic tasks like segmentation. However, existing 3D-GS-based segmentation methods typically rely on high-dimensional category features, which introduce substantial memory overhead. Moreover, fine-grained segmentation remains challenging due to label space congestion and the lack of stable multi-granularity control mechanisms. To address these limitations, we propose a coarse-to-fine binary encoding scheme for per-Gaussian category representation, which compresses each feature into a single integer via the binary-to-decimal mapping, drastically reducing memory usage. We further design a progressive training strategy that decomposes panoptic segmentation into a series of independent sub-tasks, reducing inter-class conflicts and thereby enhancing fine-grained segmentation capability. Additionally, we fine-tune opacity during segmentation training to address the incompatibility between photometric rendering and semantic segmentation, which often leads to foreground-background confusion. Extensive experiments on multiple benchmarks demonstrate that our method achieves state-of-the-art segmentation performance while significantly reducing memory consumption and accelerating inference.

Paper Structure

This paper contains 26 sections, 11 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Overview of Binary-Gaussian. Our method builds upon pre-trained 3D Gaussians by augmenting each Gaussian with multi-granularity binary features. During training, we keep the opacity values learnable and optimize the multi-level binary features $\bar{\mathbf{F}}^{1:L}$ on the rendered 2D projections using a progressive contrastive learning strategy.
  • Figure 2: Illustration of Progressive Contrastive Learning. Fine-grained segmentation at each level is built on the coarser segmentation from the preceding level. Each positive group identified at the coarse level forms an independent, gradient-isolated sub-optimization task at the finer level.
  • Figure 3: Opacity inconsistency between visual and segmentation rendering.
  • Figure 4: Segmentation results without the virtual negative across different granularities.
  • Figure 5: Comparison with baseline models on the LERF-Mask dataset. The top two rows illustrate object-specific segmentation results for coarse and fine-grained targets. The bottom three rows present panoptic segmentation results across different levels of granularity. The prohibition symbol indicates that Click-Gaussian lacks support for intermediate granularity.
  • ...and 1 more figures