Table of Contents
Fetching ...

FAST-Splat: Fast, Ambiguity-Free Semantics Transfer in Gaussian Splatting

Ola Shorinwa, Jiankai Sun, Mac Schwager

TL;DR

FAST-Splat addresses slow training/rendering and semantic ambiguity in Gaussian Splatting by introducing neural-free, single-phase semantic distillation. It attaches per-ellipsoid semantic codes and a hash-table, leveraging open-set detectors and CLIP for open-vocabulary grounding, and optimizes geometry, appearance, and semantics jointly with $\mathcal{L}_{\mathrm{sgs}} = \mathcal{L}_{\mathrm{gs}} + \mathcal{L}_{\mathrm{ce}}$. Empirically, it delivers $6\text{x}$–$8\text{x}$ faster training, $18\text{x}$–$51\text{x}$ faster rendering, and ~$6\text{x}$ lower memory with competitive or improved semantic segmentation and disambiguation (e.g., resolving prompts like "tea" to the correct object). This enables precise semantic object localization and 3D masks under open-vocabulary prompts, with potential benefits for robotics and scene editing.

Abstract

We present FAST-Splat for fast, ambiguity-free semantic Gaussian Splatting, which seeks to address the main limitations of existing semantic Gaussian Splatting methods, namely: slow training and rendering speeds; high memory usage; and ambiguous semantic object localization. We take a bottom-up approach in deriving FAST-Splat, dismantling the limitations of closed-set semantic distillation to enable open-set (open-vocabulary) semantic distillation. Ultimately, this key approach enables FAST-Splat to provide precise semantic object localization results, even when prompted with ambiguous user-provided natural-language queries. Further, by exploiting the explicit form of the Gaussian Splatting scene representation to the fullest extent, FAST-Splat retains the remarkable training and rendering speeds of Gaussian Splatting. Precisely, while existing semantic Gaussian Splatting methods distill semantics into a separate neural field or utilize neural models for dimensionality reduction, FAST-Splat directly augments each Gaussian with specific semantic codes, preserving the training, rendering, and memory-usage advantages of Gaussian Splatting over neural field methods. These Gaussian-specific semantic codes, together with a hash-table, enable semantic similarity to be measured with open-vocabulary user prompts and further enable FAST-Splat to respond with unambiguous semantic object labels and $3$D masks, unlike prior methods. In experiments, we demonstrate that FAST-Splat is 6x to 8x faster to train, achieves between 18x to 51x faster rendering speeds, and requires about 6x smaller GPU memory, compared to the best-competing semantic Gaussian Splatting methods. Further, FAST-Splat achieves relatively similar or better semantic segmentation performance compared to existing methods. After the review period, we will provide links to the project website and the codebase.

FAST-Splat: Fast, Ambiguity-Free Semantics Transfer in Gaussian Splatting

TL;DR

FAST-Splat addresses slow training/rendering and semantic ambiguity in Gaussian Splatting by introducing neural-free, single-phase semantic distillation. It attaches per-ellipsoid semantic codes and a hash-table, leveraging open-set detectors and CLIP for open-vocabulary grounding, and optimizes geometry, appearance, and semantics jointly with . Empirically, it delivers faster training, faster rendering, and ~ lower memory with competitive or improved semantic segmentation and disambiguation (e.g., resolving prompts like "tea" to the correct object). This enables precise semantic object localization and 3D masks under open-vocabulary prompts, with potential benefits for robotics and scene editing.

Abstract

We present FAST-Splat for fast, ambiguity-free semantic Gaussian Splatting, which seeks to address the main limitations of existing semantic Gaussian Splatting methods, namely: slow training and rendering speeds; high memory usage; and ambiguous semantic object localization. We take a bottom-up approach in deriving FAST-Splat, dismantling the limitations of closed-set semantic distillation to enable open-set (open-vocabulary) semantic distillation. Ultimately, this key approach enables FAST-Splat to provide precise semantic object localization results, even when prompted with ambiguous user-provided natural-language queries. Further, by exploiting the explicit form of the Gaussian Splatting scene representation to the fullest extent, FAST-Splat retains the remarkable training and rendering speeds of Gaussian Splatting. Precisely, while existing semantic Gaussian Splatting methods distill semantics into a separate neural field or utilize neural models for dimensionality reduction, FAST-Splat directly augments each Gaussian with specific semantic codes, preserving the training, rendering, and memory-usage advantages of Gaussian Splatting over neural field methods. These Gaussian-specific semantic codes, together with a hash-table, enable semantic similarity to be measured with open-vocabulary user prompts and further enable FAST-Splat to respond with unambiguous semantic object labels and D masks, unlike prior methods. In experiments, we demonstrate that FAST-Splat is 6x to 8x faster to train, achieves between 18x to 51x faster rendering speeds, and requires about 6x smaller GPU memory, compared to the best-competing semantic Gaussian Splatting methods. Further, FAST-Splat achieves relatively similar or better semantic segmentation performance compared to existing methods. After the review period, we will provide links to the project website and the codebase.

Paper Structure

This paper contains 15 sections, 3 equations, 12 figures, 16 tables.

Figures (12)

  • Figure 1: Unlike prior semantic Gaussian Splatting methods, FAST-Splat jointly optimizes the geometric, visual, and semantic attributes of Gaussian Splatting models, achieving faster training and rendering times with effective semantics disambiguation using an efficient semantics extraction procedure.
  • Figure 2: Semantic segmentation on the Bed and Covered-Desk scenes in 3D-OVS. Overall, FAST-Splat outperforms the baselines.
  • Figure 3: RGB images rendered in R-Kitchen and R-Library. FAST-Splat achieves $18$x to $51$x faster rendering speeds with at least $6$x lower memory usage, while achieving competitive or better reconstruction quality.
  • Figure 4: FAST-Splat resolves language ambiguity in natural-language queries in semantic object localization, identifying the specific semantic class of each object, e.g., a coffee machine and a kettle, when prompted with an ambiguous query, e.g., "coffee" and "cooking pot," respectively. Likewise, FAST-Splat disambiguates between a "cup" and a vase and between a "fruit" and a pottedplant in R-Library.
  • Figure 5: Color-editing of a coffee machine using FAST-Splat.
  • ...and 7 more figures