Table of Contents
Fetching ...

Click-Gaussian: Interactive Segmentation to Any 3D Gaussians

Seokhun Choi, Hyeonseop Song, Jaechul Kim, Taehyeong Kim, Hoseok Do

TL;DR

Click-Gaussian addresses the challenge of real-time, fine-grained interactive segmentation of 3D Gaussians by lifting SAM-derived 2D masks into two-level (coarse and fine) 3D feature fields. It introduces Global Feature-guided Learning (GFL) to provide view-consistent supervision by aggregating global feature candidates across the scene and aligning Gaussian features to global clusters, combined with a contrastive learning framework and multiple regularizations. The approach achieves 10 ms per single click and up to 15–130x faster performance than prior methods while delivering superior coarse and fine segmentation accuracy on real-world LERF-Mask and SPIn-NeRF datasets. This enables rapid, precise 3D scene manipulation, including open-vocabulary localization via CLIP and text-based editing, with broad applications in real-time 3D editing and content creation.

Abstract

Interactive segmentation of 3D Gaussians opens a great opportunity for real-time manipulation of 3D scenes thanks to the real-time rendering capability of 3D Gaussian Splatting. However, the current methods suffer from time-consuming post-processing to deal with noisy segmentation output. Also, they struggle to provide detailed segmentation, which is important for fine-grained manipulation of 3D scenes. In this study, we propose Click-Gaussian, which learns distinguishable feature fields of two-level granularity, facilitating segmentation without time-consuming post-processing. We delve into challenges stemming from inconsistently learned feature fields resulting from 2D segmentation obtained independently from a 3D scene. 3D segmentation accuracy deteriorates when 2D segmentation results across the views, primary cues for 3D segmentation, are in conflict. To overcome these issues, we propose Global Feature-guided Learning (GFL). GFL constructs the clusters of global feature candidates from noisy 2D segments across the views, which smooths out noises when training the features of 3D Gaussians. Our method runs in 10 ms per click, 15 to 130 times as fast as the previous methods, while also significantly improving segmentation accuracy. Our project page is available at https://seokhunchoi.github.io/Click-Gaussian

Click-Gaussian: Interactive Segmentation to Any 3D Gaussians

TL;DR

Click-Gaussian addresses the challenge of real-time, fine-grained interactive segmentation of 3D Gaussians by lifting SAM-derived 2D masks into two-level (coarse and fine) 3D feature fields. It introduces Global Feature-guided Learning (GFL) to provide view-consistent supervision by aggregating global feature candidates across the scene and aligning Gaussian features to global clusters, combined with a contrastive learning framework and multiple regularizations. The approach achieves 10 ms per single click and up to 15–130x faster performance than prior methods while delivering superior coarse and fine segmentation accuracy on real-world LERF-Mask and SPIn-NeRF datasets. This enables rapid, precise 3D scene manipulation, including open-vocabulary localization via CLIP and text-based editing, with broad applications in real-time 3D editing and content creation.

Abstract

Interactive segmentation of 3D Gaussians opens a great opportunity for real-time manipulation of 3D scenes thanks to the real-time rendering capability of 3D Gaussian Splatting. However, the current methods suffer from time-consuming post-processing to deal with noisy segmentation output. Also, they struggle to provide detailed segmentation, which is important for fine-grained manipulation of 3D scenes. In this study, we propose Click-Gaussian, which learns distinguishable feature fields of two-level granularity, facilitating segmentation without time-consuming post-processing. We delve into challenges stemming from inconsistently learned feature fields resulting from 2D segmentation obtained independently from a 3D scene. 3D segmentation accuracy deteriorates when 2D segmentation results across the views, primary cues for 3D segmentation, are in conflict. To overcome these issues, we propose Global Feature-guided Learning (GFL). GFL constructs the clusters of global feature candidates from noisy 2D segments across the views, which smooths out noises when training the features of 3D Gaussians. Our method runs in 10 ms per click, 15 to 130 times as fast as the previous methods, while also significantly improving segmentation accuracy. Our project page is available at https://seokhunchoi.github.io/Click-Gaussian
Paper Structure (36 sections, 13 equations, 17 figures, 3 tables)

This paper contains 36 sections, 13 equations, 17 figures, 3 tables.

Figures (17)

  • Figure 1: We present Click-Gaussian, a swift and precise method for interactive segmentation of 3D Gaussians using two-level granularity feature fields derived from 2D segmentation masks. Once trained, it enables users to select and segment desired objects at coarse and fine levels with a single click, completing the process within 10 ms.
  • Figure 2: Overview of the proposed method. i) Our approach augments pre-trained 3D Gaussians with two-level granularity features $\mathbf{f}_i$. ii) These features are trained through contrastive learning, utilizing 2D rendered feature maps $\mathbf{F}$ and their corresponding SAM-generated masks $M$. iii) To address inconsistencies in mask signals across views, we introduce a Global Feature-guided Learning approach. For clarity, Global Feature-guided Learning at the fine level is omitted from the illustration.
  • Figure 3: Comparison with baselines on LERF-Mask Dataset. The results are displayed in three lines per scene (Teatime, Ramen, and Figurines in order). Each scene's first two rows show coarse and fine level segmentation results, respectively, and the third row shows the PCA visualizations of each model's finest-level feature field. Our approach demonstrates superior segmentation ability in both coarse and fine levels. Red and yellow boxes indicate noisy and under-segmentation results, respectively.
  • Figure 4: Comparison with Gau-Group and GARField. Our approach performs more detailed and cleaner extractions of Gaussians, up to 130 times faster than other baselines.
  • Figure 5: Comparison for automatic segmentation of everything on novel views. Our method shows more exact and fine-grained results against baselines. For detailed experimental procedures, please refer to the supplementary materials. Gau-Group, unable to differentiate levels, presents identical coarse and fine segmentation results.
  • ...and 12 more figures