Table of Contents
Fetching ...

Gradient-Driven 3D Segmentation and Affordance Transfer in Gaussian Splatting Using 2D Masks

Joji Joseph, Bharadwaj Amrutur, Shalabh Bhatnagar

TL;DR

This paper introduces a novel voting-based method that extends 2D segmentation models to 3D Gaussian splats that leverages masked gradients, where gradients are filtered by input 2D masks, and these gradients are used as votes to achieve accurate segmentation.

Abstract

3D Gaussian Splatting has emerged as a powerful 3D scene representation technique, capturing fine details with high efficiency. In this paper, we introduce a novel voting-based method that extends 2D segmentation models to 3D Gaussian splats. Our approach leverages masked gradients, where gradients are filtered by input 2D masks, and these gradients are used as votes to achieve accurate segmentation. As a byproduct, we discovered that inference-time gradients can also be used to prune Gaussians, resulting in up to 21% compression. Additionally, we explore few-shot affordance transfer, allowing annotations from 2D images to be effectively transferred onto 3D Gaussian splats. The robust yet straightforward mathematical formulation underlying this approach makes it a highly effective tool for numerous downstream applications, such as augmented reality (AR), object editing, and robotics. The project code and additional resources are available at https://jojijoseph.github.io/3dgs-segmentation.

Gradient-Driven 3D Segmentation and Affordance Transfer in Gaussian Splatting Using 2D Masks

TL;DR

This paper introduces a novel voting-based method that extends 2D segmentation models to 3D Gaussian splats that leverages masked gradients, where gradients are filtered by input 2D masks, and these gradients are used as votes to achieve accurate segmentation.

Abstract

3D Gaussian Splatting has emerged as a powerful 3D scene representation technique, capturing fine details with high efficiency. In this paper, we introduce a novel voting-based method that extends 2D segmentation models to 3D Gaussian splats. Our approach leverages masked gradients, where gradients are filtered by input 2D masks, and these gradients are used as votes to achieve accurate segmentation. As a byproduct, we discovered that inference-time gradients can also be used to prune Gaussians, resulting in up to 21% compression. Additionally, we explore few-shot affordance transfer, allowing annotations from 2D images to be effectively transferred onto 3D Gaussian splats. The robust yet straightforward mathematical formulation underlying this approach makes it a highly effective tool for numerous downstream applications, such as augmented reality (AR), object editing, and robotics. The project code and additional resources are available at https://jojijoseph.github.io/3dgs-segmentation.
Paper Structure (11 sections, 2 equations, 5 figures, 3 tables)

This paper contains 11 sections, 2 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Comparison of our approach with Feature-3DGS. The segmentation produced by Feature-3DGS is less clean than our method because individual Gaussians may not have features that are fully representative of the final rendered features.
  • Figure 2: (a) Example image with annotated affordances. (b) Segmentation map showing transferred affordances in a 3D scene. (c)-(f) Visualizations of different parts, with the remaining Gaussians represented as a point cloud for clarity, allowing better understanding of regions in relation to the whole scene.
  • Figure 3: The first column shows a rendered frame along with its corresponding input mask. In the second column, we present the results after extracting the 3D Gaussians that align with the generated 3D mask. The third column illustrates the rendered output from the remaining Gaussians. Notably, the background is visible instead of a blank space, as the segmentation occurs directly within the 3D space. For enhanced clarity, zooming in is recommended.
  • Figure 4: Examples of images used for affordance transfer with annotations. Note that these instances, though belonging to the same class, are different from the objects in the target frames.
  • Figure 5: Results of 2D-2D and 2D-3D affordance transfer. The labels generated during the 2D-2D affordance transfer serve as input to the 2D-3D affordance transfer. Despite not having perfectly aligned labels, voting over multiple frames makes 2D-3D affordance transfer more precise.