FMLGS: Fast Multilevel Language Embedded Gaussians for Part-level Interactive Agents

Xin Tan; Yuzhou Ji; He Zhu; Yuan Xie

FMLGS: Fast Multilevel Language Embedded Gaussians for Part-level Interactive Agents

Xin Tan, Yuzhou Ji, He Zhu, Yuan Xie

TL;DR

FMLGS tackles the challenge of part-level open-vocabulary localization in 3D radiance fields by introducing a multilevel, SAM2-assisted pipeline that extracts and semantically deviates object- and part-level features, maps them across views, and trains Gaussian-based features for fast, pixel-aligned querying. The method achieves state-of-the-art speed and accuracy on open-vocabulary localization and supports interactive agents capable of navigating scenes and responding to natural language prompts. Key innovations include semantic deviation to resolve language ambiguity, identity-based cross-view feature mapping, and a two-step multilevel localization strategy. These contributions enable practical applications in language-driven 3D segmentation and object inpainting, with potential impact on embodied AI and interactive scene understanding.

Abstract

The semantically interactive radiance field has long been a promising backbone for 3D real-world applications, such as embodied AI to achieve scene understanding and manipulation. However, multi-granularity interaction remains a challenging task due to the ambiguity of language and degraded quality when it comes to queries upon object components. In this work, we present FMLGS, an approach that supports part-level open-vocabulary query within 3D Gaussian Splatting (3DGS). We propose an efficient pipeline for building and querying consistent object- and part-level semantics based on Segment Anything Model 2 (SAM2). We designed a semantic deviation strategy to solve the problem of language ambiguity among object parts, which interpolates the semantic features of fine-grained targets for enriched information. Once trained, we can query both objects and their describable parts using natural language. Comparisons with other state-of-the-art methods prove that our method can not only better locate specified part-level targets, but also achieve first-place performance concerning both speed and accuracy, where FMLGS is 98 x faster than LERF, 4 x faster than LangSplat and 2.5 x faster than LEGaussians. Meanwhile, we further integrate FMLGS as a virtual agent that can interactively navigate through 3D scenes, locate targets, and respond to user demands through a chat interface, which demonstrates the potential of our work to be further expanded and applied in the future.

FMLGS: Fast Multilevel Language Embedded Gaussians for Part-level Interactive Agents

TL;DR

Abstract

FMLGS: Fast Multilevel Language Embedded Gaussians for Part-level Interactive Agents

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)