Table of Contents
Fetching ...

GaussianCut: Interactive segmentation via graph cut for 3D Gaussian Splatting

Umangi Jain, Ashkan Mirzaei, Igor Gilitschenski

TL;DR

This work introduces GaussianCut, a new method for interactive multiview segmentation of scenes represented as 3D Gaussians that achieves competitive performance with state-of-the-art approaches for 3D segmentation without requiring any additional segmentation-aware training.

Abstract

We introduce GaussianCut, a new method for interactive multiview segmentation of scenes represented as 3D Gaussians. Our approach allows for selecting the objects to be segmented by interacting with a single view. It accepts intuitive user input, such as point clicks, coarse scribbles, or text. Using 3D Gaussian Splatting (3DGS) as the underlying scene representation simplifies the extraction of objects of interest which are considered to be a subset of the scene's Gaussians. Our key idea is to represent the scene as a graph and use the graph-cut algorithm to minimize an energy function to effectively partition the Gaussians into foreground and background. To achieve this, we construct a graph based on scene Gaussians and devise a segmentation-aligned energy function on the graph to combine user inputs with scene properties. To obtain an initial coarse segmentation, we leverage 2D image/video segmentation models and further refine these coarse estimates using our graph construction. Our empirical evaluations show the adaptability of GaussianCut across a diverse set of scenes. GaussianCut achieves competitive performance with state-of-the-art approaches for 3D segmentation without requiring any additional segmentation-aware training.

GaussianCut: Interactive segmentation via graph cut for 3D Gaussian Splatting

TL;DR

This work introduces GaussianCut, a new method for interactive multiview segmentation of scenes represented as 3D Gaussians that achieves competitive performance with state-of-the-art approaches for 3D segmentation without requiring any additional segmentation-aware training.

Abstract

We introduce GaussianCut, a new method for interactive multiview segmentation of scenes represented as 3D Gaussians. Our approach allows for selecting the objects to be segmented by interacting with a single view. It accepts intuitive user input, such as point clicks, coarse scribbles, or text. Using 3D Gaussian Splatting (3DGS) as the underlying scene representation simplifies the extraction of objects of interest which are considered to be a subset of the scene's Gaussians. Our key idea is to represent the scene as a graph and use the graph-cut algorithm to minimize an energy function to effectively partition the Gaussians into foreground and background. To achieve this, we construct a graph based on scene Gaussians and devise a segmentation-aligned energy function on the graph to combine user inputs with scene properties. To obtain an initial coarse segmentation, we leverage 2D image/video segmentation models and further refine these coarse estimates using our graph construction. Our empirical evaluations show the adaptability of GaussianCut across a diverse set of scenes. GaussianCut achieves competitive performance with state-of-the-art approaches for 3D segmentation without requiring any additional segmentation-aware training.

Paper Structure

This paper contains 31 sections, 6 equations, 13 figures, 12 tables.

Figures (13)

  • Figure 1: Our method, GaussianCut, enables interactive object(s) selection. Given an optimized 3D Gaussian Splatting model for a scene with user inputs (clicks, scribbles, or text) on any viewpoint, GaussianCut partitions the set of Gaussians as foreground and background.
  • Figure 2: Overall pipeline of GaussianCut. User input from any viewpoint is passed to a video segmentation model to produce multi-view masks. We rasterize every view and track the contribution of each Gaussian to masked and unmasked pixels. Then, Gaussians are formulated as nodes in an undirected graph and we adapt graph cut to partition the graph. The red edges in the graph highlight the set of edges graph cut removes for partitioning the graph.
  • Figure 3: Visualization results of different objects in the following scenes: truck from Tanks and Temples knapitsch2017tanks, kitchen from Mip-NeRF 360 barron2022mip, tools from Shiny wizadwongsa2021nex.
  • Figure 4: Qualitative comparison: 3D segmentation results of GaussianCut using text on 360-garden barron2022mip scene. Compared to ISRF goel2023interactive, SA3D cen2024segment, SAGD hu2024semantic, GaussianCut segment contain finer details. The graph cut component of GaussianCut also retrieves fine details (like decorations on the plant) that are missed in coarse splatting.
  • Figure 5: Limitation of LangSplat on Trex and Leaves scenes from NVOS benchmark. Parts of the trex can not be extracted in the top row. In the bottom row, background leaves are also selected along with front leaf.
  • ...and 8 more figures