Table of Contents
Fetching ...

Segment Any 3D Gaussians

Jiazhong Cen, Jiemin Fang, Chen Yang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian

TL;DR

SAGA addresses the challenge of promptable 3D segmentation by augmenting 3D Gaussian Splatting with scale-gated Gaussian affinity features learned through scale-aware contrastive learning that distills segmentation capabilities from SAM. The approach enables real-time, multi-granularity 3D segmentation from 2D prompts and supports open-vocabulary results through a vote-based CLIP integration, while maintaining efficiency through a lightweight scale gate and explicit per-Gaussian features. Extensive experiments on NVOS, SPIn-NeRF, and 3D-OVS demonstrate state-of-the-art performance and real-time inference, with ablations validating the contributions of smoothing and norm regularization. Overall, SAGA provides a simple, effective pathway to promptable segmentation in 3D-GS, opening avenues for faster 3D scene understanding and open-vocabulary applications.

Abstract

This paper presents SAGA (Segment Any 3D GAussians), a highly efficient 3D promptable segmentation method based on 3D Gaussian Splatting (3D-GS). Given 2D visual prompts as input, SAGA can segment the corresponding 3D target represented by 3D Gaussians within 4 ms. This is achieved by attaching an scale-gated affinity feature to each 3D Gaussian to endow it a new property towards multi-granularity segmentation. Specifically, a scale-aware contrastive training strategy is proposed for the scale-gated affinity feature learning. It 1) distills the segmentation capability of the Segment Anything Model (SAM) from 2D masks into the affinity features and 2) employs a soft scale gate mechanism to deal with multi-granularity ambiguity in 3D segmentation through adjusting the magnitude of each feature channel according to a specified 3D physical scale. Evaluations demonstrate that SAGA achieves real-time multi-granularity segmentation with quality comparable to state-of-the-art methods. As one of the first methods addressing promptable segmentation in 3D-GS, the simplicity and effectiveness of SAGA pave the way for future advancements in this field. Our code will be released.

Segment Any 3D Gaussians

TL;DR

SAGA addresses the challenge of promptable 3D segmentation by augmenting 3D Gaussian Splatting with scale-gated Gaussian affinity features learned through scale-aware contrastive learning that distills segmentation capabilities from SAM. The approach enables real-time, multi-granularity 3D segmentation from 2D prompts and supports open-vocabulary results through a vote-based CLIP integration, while maintaining efficiency through a lightweight scale gate and explicit per-Gaussian features. Extensive experiments on NVOS, SPIn-NeRF, and 3D-OVS demonstrate state-of-the-art performance and real-time inference, with ablations validating the contributions of smoothing and norm regularization. Overall, SAGA provides a simple, effective pathway to promptable segmentation in 3D-GS, opening avenues for faster 3D scene understanding and open-vocabulary applications.

Abstract

This paper presents SAGA (Segment Any 3D GAussians), a highly efficient 3D promptable segmentation method based on 3D Gaussian Splatting (3D-GS). Given 2D visual prompts as input, SAGA can segment the corresponding 3D target represented by 3D Gaussians within 4 ms. This is achieved by attaching an scale-gated affinity feature to each 3D Gaussian to endow it a new property towards multi-granularity segmentation. Specifically, a scale-aware contrastive training strategy is proposed for the scale-gated affinity feature learning. It 1) distills the segmentation capability of the Segment Anything Model (SAM) from 2D masks into the affinity features and 2) employs a soft scale gate mechanism to deal with multi-granularity ambiguity in 3D segmentation through adjusting the magnitude of each feature channel according to a specified 3D physical scale. Evaluations demonstrate that SAGA achieves real-time multi-granularity segmentation with quality comparable to state-of-the-art methods. As one of the first methods addressing promptable segmentation in 3D-GS, the simplicity and effectiveness of SAGA pave the way for future advancements in this field. Our code will be released.
Paper Structure (43 sections, 15 equations, 11 figures, 4 tables)

This paper contains 43 sections, 15 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: SAGA performs promptable multi-granularity segmentation within milliseconds. Prompts are marked by points.
  • Figure 2: The architecture of SAGA. Left: SAGA attaches a Gaussian affinity feature to each 3D Gaussian. The magnitude of different affinity feature channels are adjusted by a soft scale gate to handle multi-granularity ambiguity. Right: SAGA distills segmentation ability of SAM into affinity features attached to 3D Gaussians in the 3D-GS model through scale-aware contrastive learning.
  • Figure 3: Qualitative results of SAGA across different scenes. We provide both the targets segmented via 2D point prompts and the "segment everything" results.
  • Figure 4: SAGA can maintain the high frequency texture details captured by 3D-GS. We reveal the inherent structure of these details by shrinking the Gaussians by 60%.
  • Figure 5: Ablation study on effects of local feature smoothing (Smooth) and feature norm regularization (Feature Norm). Outliers are primarily eliminated through local feature smoothing. Feature norm regularization helps features of inner Gaussians align better with those of the surface.
  • ...and 6 more figures