Click-Gaussian: Interactive Segmentation to Any 3D Gaussians
Seokhun Choi, Hyeonseop Song, Jaechul Kim, Taehyeong Kim, Hoseok Do
TL;DR
Click-Gaussian addresses the challenge of real-time, fine-grained interactive segmentation of 3D Gaussians by lifting SAM-derived 2D masks into two-level (coarse and fine) 3D feature fields. It introduces Global Feature-guided Learning (GFL) to provide view-consistent supervision by aggregating global feature candidates across the scene and aligning Gaussian features to global clusters, combined with a contrastive learning framework and multiple regularizations. The approach achieves 10 ms per single click and up to 15–130x faster performance than prior methods while delivering superior coarse and fine segmentation accuracy on real-world LERF-Mask and SPIn-NeRF datasets. This enables rapid, precise 3D scene manipulation, including open-vocabulary localization via CLIP and text-based editing, with broad applications in real-time 3D editing and content creation.
Abstract
Interactive segmentation of 3D Gaussians opens a great opportunity for real-time manipulation of 3D scenes thanks to the real-time rendering capability of 3D Gaussian Splatting. However, the current methods suffer from time-consuming post-processing to deal with noisy segmentation output. Also, they struggle to provide detailed segmentation, which is important for fine-grained manipulation of 3D scenes. In this study, we propose Click-Gaussian, which learns distinguishable feature fields of two-level granularity, facilitating segmentation without time-consuming post-processing. We delve into challenges stemming from inconsistently learned feature fields resulting from 2D segmentation obtained independently from a 3D scene. 3D segmentation accuracy deteriorates when 2D segmentation results across the views, primary cues for 3D segmentation, are in conflict. To overcome these issues, we propose Global Feature-guided Learning (GFL). GFL constructs the clusters of global feature candidates from noisy 2D segments across the views, which smooths out noises when training the features of 3D Gaussians. Our method runs in 10 ms per click, 15 to 130 times as fast as the previous methods, while also significantly improving segmentation accuracy. Our project page is available at https://seokhunchoi.github.io/Click-Gaussian
