Binary-Gaussian: Compact and Progressive Representation for 3D Gaussian Segmentation
An Yang, Chenyu Liu, Jun Du, Jianqing Gao, Jia Pan, Jinshui Hu, Baocai Yin, Bing Yin, Cong Liu
TL;DR
The paper tackles memory-heavy semantic segmentation in 3D Gaussian Splatting by replacing per-Gaussian continuous category features with compact binary codes in a coarse-to-fine, multi-granularity framework. A progressive contrastive learning scheme aligns features across granularity levels, while a virtual-negative mechanism and opacity tuning address representation and rendering-semantic gaps. Additional strategies like mask-balanced sampling further boost fine-grained segmentation quality. Empirical results demonstrate state-of-the-art performance with dramatically reduced memory usage and fast inference across real-world datasets. The approach enables precise, scalable 3D panoptic segmentation suitable for real-time applications.
Abstract
3D Gaussian Splatting (3D-GS) has emerged as an efficient 3D representation and a promising foundation for semantic tasks like segmentation. However, existing 3D-GS-based segmentation methods typically rely on high-dimensional category features, which introduce substantial memory overhead. Moreover, fine-grained segmentation remains challenging due to label space congestion and the lack of stable multi-granularity control mechanisms. To address these limitations, we propose a coarse-to-fine binary encoding scheme for per-Gaussian category representation, which compresses each feature into a single integer via the binary-to-decimal mapping, drastically reducing memory usage. We further design a progressive training strategy that decomposes panoptic segmentation into a series of independent sub-tasks, reducing inter-class conflicts and thereby enhancing fine-grained segmentation capability. Additionally, we fine-tune opacity during segmentation training to address the incompatibility between photometric rendering and semantic segmentation, which often leads to foreground-background confusion. Extensive experiments on multiple benchmarks demonstrate that our method achieves state-of-the-art segmentation performance while significantly reducing memory consumption and accelerating inference.
