Contrastive Gaussian Clustering: Weakly Supervised 3D Scene Segmentation
Myrna C. Silva, Mahtab Dahaghin, Matteo Toso, Alessio Del Bue
TL;DR
This work addresses the difficulty of obtaining reliable 3D scene segmentation with limited 3D annotations by embedding a 3D segmentation feature field into a 3D Gaussian Splatting representation. It introduces a contrastive clustering objective on rendered 3D features and a spatial-similarity regularization to learn cross-view-consistent segmentation from inconsistent 2D masks, enabling both 2D novel-view segmentation and 3D scene partitioning. The approach outperforms state-of-the-art methods on open-vocabulary segmentation benchmarks, achieving higher mIoU and boundary alignment while maintaining real-time rendering capabilities. This yields a practical, scalable path for 3D scene understanding in cluttered or open-domain environments, with potential for integration with language-enabled prompts and hierarchical segmentation in future work.
Abstract
We introduce Contrastive Gaussian Clustering, a novel approach capable of provide segmentation masks from any viewpoint and of enabling 3D segmentation of the scene. Recent works in novel-view synthesis have shown how to model the appearance of a scene via a cloud of 3D Gaussians, and how to generate accurate images from a given viewpoint by projecting on it the Gaussians before $α$ blending their color. Following this example, we train a model to include also a segmentation feature vector for each Gaussian. These can then be used for 3D scene segmentation, by clustering Gaussians according to their feature vectors; and to generate 2D segmentation masks, by projecting the Gaussians on a plane and $α$ blending over their segmentation features. Using a combination of contrastive learning and spatial regularization, our method can be trained on inconsistent 2D segmentation masks, and still learn to generate segmentation masks consistent across all views. Moreover, the resulting model is extremely accurate, improving the IoU accuracy of the predicted masks by $+8\%$ over the state of the art. Code and trained models will be released soon.
