Table of Contents
Fetching ...

ArtisanGS: Interactive Tools for Gaussian Splat Selection with AI and Human in the Loop

Clement Fuji Tsang, Anita Hu, Or Perel, Carsten Kolve, Maria Shugrina

TL;DR

ArtisanGS addresses the challenge of extracting and editing objects from in-the-wild 3D Gaussian Splat scenes by delivering an interactive, AI-assisted segmentation toolkit that propagates 2D masks to 3D labels and supports extensive user corrections. The approach combines manual projection modes, robust multi-view mask tracking with memory frames, and a fast 3D aggregation step that yields a target 3D Gaussian mask without heavy scene-specific optimization. It demonstrates competitive quantitative performance against baselines, while offering significantly faster, interactive iteration and practical editing capabilities, including orientation and targeted local edits guided by a Video Diffusion Model. The work enables downstream applications such as physics simulation, object-oriented editing, and scene composition directly over 3DGS objects sourced from wild captures, broadening the practical usability of Gaussian splats in robotics and graphics. Overall, ArtisanGS provides a training-free, user-in-the-loop workflow that improves precision, editability, and speed for real-world 3DGS manipulation.

Abstract

Representation in the family of 3D Gaussian Splats (3DGS) are growing into a viable alternative to traditional graphics for an expanding number of application, including recent techniques that facilitate physics simulation and animation. However, extracting usable objects from in-the-wild captures remains challenging and controllable editing techniques for this representation are limited. Unlike the bulk of emerging techniques, focused on automatic solutions or high-level editing, we introduce an interactive suite of tools centered around versatile Gaussian Splat selection and segmentation. We propose a fast AI-driven method to propagate user-guided 2D selection masks to 3DGS selections. This technique allows for user intervention in the case of errors and is further coupled with flexible manual selection and segmentation tools. These allow a user to achieve virtually any binary segmentation of an unstructured 3DGS scene. We evaluate our toolset against the state-of-the-art for Gaussian Splat selection and demonstrate their utility for downstream applications by developing a user-guided local editing approach, leveraging a custom Video Diffusion Model. With flexible selection tools, users have direct control over the areas that the AI can modify. Our selection and editing tools can be used for any in-the-wild capture without additional optimization.

ArtisanGS: Interactive Tools for Gaussian Splat Selection with AI and Human in the Loop

TL;DR

ArtisanGS addresses the challenge of extracting and editing objects from in-the-wild 3D Gaussian Splat scenes by delivering an interactive, AI-assisted segmentation toolkit that propagates 2D masks to 3D labels and supports extensive user corrections. The approach combines manual projection modes, robust multi-view mask tracking with memory frames, and a fast 3D aggregation step that yields a target 3D Gaussian mask without heavy scene-specific optimization. It demonstrates competitive quantitative performance against baselines, while offering significantly faster, interactive iteration and practical editing capabilities, including orientation and targeted local edits guided by a Video Diffusion Model. The work enables downstream applications such as physics simulation, object-oriented editing, and scene composition directly over 3DGS objects sourced from wild captures, broadening the practical usability of Gaussian splats in robotics and graphics. Overall, ArtisanGS provides a training-free, user-in-the-loop workflow that improves precision, editability, and speed for real-world 3DGS manipulation.

Abstract

Representation in the family of 3D Gaussian Splats (3DGS) are growing into a viable alternative to traditional graphics for an expanding number of application, including recent techniques that facilitate physics simulation and animation. However, extracting usable objects from in-the-wild captures remains challenging and controllable editing techniques for this representation are limited. Unlike the bulk of emerging techniques, focused on automatic solutions or high-level editing, we introduce an interactive suite of tools centered around versatile Gaussian Splat selection and segmentation. We propose a fast AI-driven method to propagate user-guided 2D selection masks to 3DGS selections. This technique allows for user intervention in the case of errors and is further coupled with flexible manual selection and segmentation tools. These allow a user to achieve virtually any binary segmentation of an unstructured 3DGS scene. We evaluate our toolset against the state-of-the-art for Gaussian Splat selection and demonstrate their utility for downstream applications by developing a user-guided local editing approach, leveraging a custom Video Diffusion Model. With flexible selection tools, users have direct control over the areas that the AI can modify. Our selection and editing tools can be used for any in-the-wild capture without additional optimization.
Paper Structure (33 sections, 1 equation, 8 figures, 3 tables)

This paper contains 33 sections, 1 equation, 8 figures, 3 tables.

Figures (8)

  • Figure 1: 3D Capture Setups: While segmenting objects from controlled captures, like the toy suspended by wires (a), is relatively simple with existing tools, these solutions fall short on more realistic use cases (b).
  • Figure 2: Auto-Tracked Segmentation with Corrections: We propose automatic way to project 2D user masks $S^{\scriptsize \boxplus}_i$ to 3D selection $S^{\,\hbox{\footnotesize \mancube}}$ over 3D Gaussians, while allowing users to correct the outcome (§\ref{['ssec:seg_auto']}). Left: notation and selection modes supported in our design (§\ref{['sec:segment']}).
  • Figure 3: Manual Projection(§\ref{['ssec:seg_manual']}) of 2D masks $S^{\scriptsize \boxplus}_i$ to 3D, combined with different selection modes (§\ref{['ssec:seg_modes']}), allow flexible manual selection.
  • Figure 4: Evaluation Datasets: Annotations on both NVOS (a) and LERF-Mask (b) have inaccuracies. We suggest alternative inputs for NVOS (a)
  • Figure 5: Impact of presegmentation on the inputs to the tracker. Top: Input without presegmentation. Bottom: Input with presegmentation.
  • ...and 3 more figures