LagMemo: Language 3D Gaussian Splatting Memory for Multi-modal Open-vocabulary Multi-goal Visual Navigation

Haotian Zhou; Xiaole Wang; He Li; Fusheng Sun; Shengyu Guo; Guolei Qi; Jianghuan Xu; Huijing Zhao

LagMemo: Language 3D Gaussian Splatting Memory for Multi-modal Open-vocabulary Multi-goal Visual Navigation

Haotian Zhou, Xiaole Wang, He Li, Fusheng Sun, Shengyu Guo, Guolei Qi, Jianghuan Xu, Huijing Zhao

TL;DR

LagMemo tackles open-vocabulary multi-goal visual navigation by building a language-augmented 3D Gaussian Splatting memory during exploration. A language codebook links 3D Gaussians with language features, enabling memory-guided localization and multi-modal goal querying, while a perception-based verification loop ensures correct goal identification. The authors introduce GOAT-Core, a higher-quality core benchmark, and demonstrate that LagMemo surpasses state-of-the-art methods in both goal localization and multi-goal navigation. The work advances practical robotic navigation by combining 3D geometric memory with language understanding to handle diverse, real-world goals.

Abstract

Navigating to a designated goal using visual information is a fundamental capability for intelligent robots. Most classical visual navigation methods are restricted to single-goal, single-modality, and closed set goal settings. To address the practical demands of multi-modal, open-vocabulary goal queries and multi-goal visual navigation, we propose LagMemo, a navigation system that leverages a language 3D Gaussian Splatting memory. During exploration, LagMemo constructs a unified 3D language memory. With incoming task goals, the system queries the memory, predicts candidate goal locations, and integrates a local perception-based verification mechanism to dynamically match and validate goals during navigation. For fair and rigorous evaluation, we curate GOAT-Core, a high-quality core split distilled from GOAT-Bench tailored to multi-modal open-vocabulary multi-goal visual navigation. Experimental results show that LagMemo's memory module enables effective multi-modal open-vocabulary goal localization, and that LagMemo outperforms state-of-the-art methods in multi-goal visual navigation. Project page: https://weekgoodday.github.io/lagmemo

LagMemo: Language 3D Gaussian Splatting Memory for Multi-modal Open-vocabulary Multi-goal Visual Navigation

TL;DR

Abstract

LagMemo: Language 3D Gaussian Splatting Memory for Multi-modal Open-vocabulary Multi-goal Visual Navigation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)