Table of Contents
Fetching ...

Occam's LGS: An Efficient Approach for Language Gaussian Splatting

Jiahuan Cheng, Jan-Nico Zaech, Luc Van Gool, Danda Pani Paudel

TL;DR

Occam's LGS introduces a training-free, probabilistically grounded method for lifting high-dimensional 2D language features into 3D Gaussian Splatting representations. By densely modeling forward rendering with per-Gaussian alpha blending and a maximum-likelihood feature uplifting, the approach achieves state-of-the-art open vocabulary segmentation on LERF and 3D-OVS while delivering two orders of magnitude faster uplift than prior methods and avoiding feature-space compression. A simple filtering step removes noisy Gaussians, enabling scalable scene editing such as object insertion without scene-specific retraining. The work demonstrates strong theoretical grounding and practical efficiency, enabling real-time-like capabilities and downstream tasks in open-world 3D understanding and manipulation.

Abstract

TL;DR: Gaussian Splatting is a widely adopted approach for 3D scene representation, offering efficient, high-quality reconstruction and rendering. A key reason for its success is the simplicity of representing scenes with sets of Gaussians, making it interpretable and adaptable. To enhance understanding beyond visual representation, recent approaches extend Gaussian Splatting with semantic vision-language features, enabling open-set tasks. Typically, these language features are aggregated from multiple 2D views, however, existing methods rely on cumbersome techniques, resulting in high computational costs and longer training times. In this work, we show that the complicated pipelines for language 3D Gaussian Splatting are simply unnecessary. Instead, we follow a probabilistic formulation of Language Gaussian Splatting and apply Occam's razor to the task at hand, leading to a highly efficient weighted multi-view feature aggregation technique. Doing so offers us state-of-the-art results with a speed-up of two orders of magnitude without any compression, allowing for easy scene manipulation. Project Page: https://insait-institute.github.io/OccamLGS/

Occam's LGS: An Efficient Approach for Language Gaussian Splatting

TL;DR

Occam's LGS introduces a training-free, probabilistically grounded method for lifting high-dimensional 2D language features into 3D Gaussian Splatting representations. By densely modeling forward rendering with per-Gaussian alpha blending and a maximum-likelihood feature uplifting, the approach achieves state-of-the-art open vocabulary segmentation on LERF and 3D-OVS while delivering two orders of magnitude faster uplift than prior methods and avoiding feature-space compression. A simple filtering step removes noisy Gaussians, enabling scalable scene editing such as object insertion without scene-specific retraining. The work demonstrates strong theoretical grounding and practical efficiency, enabling real-time-like capabilities and downstream tasks in open-world 3D understanding and manipulation.

Abstract

TL;DR: Gaussian Splatting is a widely adopted approach for 3D scene representation, offering efficient, high-quality reconstruction and rendering. A key reason for its success is the simplicity of representing scenes with sets of Gaussians, making it interpretable and adaptable. To enhance understanding beyond visual representation, recent approaches extend Gaussian Splatting with semantic vision-language features, enabling open-set tasks. Typically, these language features are aggregated from multiple 2D views, however, existing methods rely on cumbersome techniques, resulting in high computational costs and longer training times. In this work, we show that the complicated pipelines for language 3D Gaussian Splatting are simply unnecessary. Instead, we follow a probabilistic formulation of Language Gaussian Splatting and apply Occam's razor to the task at hand, leading to a highly efficient weighted multi-view feature aggregation technique. Doing so offers us state-of-the-art results with a speed-up of two orders of magnitude without any compression, allowing for easy scene manipulation. Project Page: https://insait-institute.github.io/OccamLGS/

Paper Structure

This paper contains 26 sections, 14 equations, 14 figures, 7 tables, 1 algorithm.

Figures (14)

  • Figure 1: Occam's LGS performs training-free language feature aggregation in 3D by accurately modeling rendering of Gaussian Splatting, avoiding expensive training in feature space. This improves runtime by two orders of magnitude, while achieving SOTA performance for open-set vision-language tasks and allowing for downstream applications such as scene editing.
  • Figure 2: Overview of our method: Occam's LGS consists of two main stages: (1) Forward rendering process of 3D Gaussian Splatting to obtain alpha blending weights $w$, projected positions $x_i'$ for each Gaussian and their corresponding pixels $p_i$ in 2D views, followed by weighted aggregation of multi-view semantic features (Sec. 3.3); and (2) Filtering of noisy Gaussians that remain invisible throughout the rendering process (Sec. 3.4).
  • Figure 3: PCA visualization of semantic features on LERF Dataset. (a) Original RGB rendering. (b) PCA visualization of our rendered semantic feature maps. (c) Initial semantic feature maps extracted using CLIP from SAM segmentations.
  • Figure 4: Qualitative comparison of relevancy score visualization.
  • Figure 5: Comprehensive analysis of computational efficiency metrics across different feature dimensions, showing rendering speed (FPS), storage size, GPU memory usage, and runtime performance as Gaussian size and frame numbers vary.
  • ...and 9 more figures