GaussNav: Gaussian Splatting for Visual Navigation
Xiaohan Lei, Min Wang, Wengang Zhou, Houqiang Li
TL;DR
GaussNav tackles Instance ImageGoal Navigation by replacing traditional BEV maps with a 3D Gaussian Splatting–based Semantic Gaussian that preserves geometry, semantics, and texture. The method grounds the target object through rendering descriptive views of candidate instances and robustly matches them to the goal image, effectively reframing IIN as a tractable point-goal task. A three-stage pipeline—Frontier Exploration, Semantic Gaussian Construction, and Gaussian Navigation—yields state-of-the-art performance on HM3D with SPL up to $0.578$ and over 20 FPS, while ablations highlight the importance of classification, matching, and novel view synthesis. This work advances instance-level visual navigation by leveraging differentiable rendering and a composable 3D Gaussian representation to retain texture details critical for distinguishing objects across viewpoints.
Abstract
In embodied vision, Instance ImageGoal Navigation (IIN) requires an agent to locate a specific object depicted in a goal image within an unexplored environment. The primary challenge of IIN arises from the need to recognize the target object across varying viewpoints while ignoring potential distractors. Existing map-based navigation methods typically use Bird's Eye View (BEV) maps, which lack detailed texture representation of a scene. Consequently, while BEV maps are effective for semantic-level visual navigation, they are struggling for instance-level tasks. To this end, we propose a new framework for IIN, Gaussian Splatting for Visual Navigation (GaussNav), which constructs a novel map representation based on 3D Gaussian Splatting (3DGS). The GaussNav framework enables the agent to memorize both the geometry and semantic information of the scene, as well as retain the textural features of objects. By matching renderings of similar objects with the target, the agent can accurately identify, ground, and navigate to the specified object. Our GaussNav framework demonstrates a significant performance improvement, with Success weighted by Path Length (SPL) increasing from 0.347 to 0.578 on the challenging Habitat-Matterport 3D (HM3D) dataset. The source code is publicly available at the link: https://github.com/XiaohanLei/GaussNav.
