Lifting by Gaussians: A Simple, Fast and Flexible Method for 3D Instance Segmentation
Rohan Chacko, Nicolai Haeni, Eldar Khaliullin, Lin Sun, Douglas Lee
TL;DR
Open-world 3D instance segmentation on existing 3D Gaussian Splatting Fields is challenging due to the lack of 3D foundation models and heavy training requirements. Lifting-by-Gaussians (LBG) tackles this by attaching 2D SAM masks and 2D foundation-model features (CLIP, DINOv2) to 3D Gaussians via a per-pixel max-contributor assignment, then incrementally merging fragments across frames with geometric and semantic cues and a hierarchical object–part–subpart decomposition. The approach is training-free and parameterization-agnostic, yielding higher-quality 3D assets with substantially faster processing than prior methods, and it maintains competitive performance in 2D novel-view mask rendering. This enables rapid 3D scene understanding suitable for AR/VR, robotics, and large-scale 3D reconstruction, with potential extensions to lift additional 2D features into 3D and refine small-object segmentation.
Abstract
We introduce Lifting By Gaussians (LBG), a novel approach for open-world instance segmentation of 3D Gaussian Splatted Radiance Fields (3DGS). Recently, 3DGS Fields have emerged as a highly efficient and explicit alternative to Neural Field-based methods for high-quality Novel View Synthesis. Our 3D instance segmentation method directly lifts 2D segmentation masks from SAM (alternately FastSAM, etc.), together with features from CLIP and DINOv2, directly fusing them onto 3DGS (or similar Gaussian radiance fields such as 2DGS). Unlike previous approaches, LBG requires no per-scene training, allowing it to operate seamlessly on any existing 3DGS reconstruction. Our approach is not only an order of magnitude faster and simpler than existing approaches; it is also highly modular, enabling 3D semantic segmentation of existing 3DGS fields without requiring a specific parametrization of the 3D Gaussians. Furthermore, our technique achieves superior semantic segmentation for 2D semantic novel view synthesis and 3D asset extraction results while maintaining flexibility and efficiency. We further introduce a novel approach to evaluate individually segmented 3D assets from 3D radiance field segmentation methods.
