VLN-Game: Vision-Language Equilibrium Search for Zero-Shot Semantic Navigation

Bangguo Yu; Yuzhen Liu; Lei Han; Hamidreza Kasaei; Tingguang Li; Ming Cao

VLN-Game: Vision-Language Equilibrium Search for Zero-Shot Semantic Navigation

Bangguo Yu, Yuzhen Liu, Lei Han, Hamidreza Kasaei, Tingguang Li, Ming Cao

TL;DR

The proposed VLN-Game, a novel zero-shot framework for visual target navigation that can process object names and descriptive language targets effectively, constructs a 3D object-centric spatial map by integrating pre-trained visual-language features with a 3D reconstruction of the physical environment.

Abstract

Following human instructions to explore and search for a specified target in an unfamiliar environment is a crucial skill for mobile service robots. Most of the previous works on object goal navigation have typically focused on a single input modality as the target, which may lead to limited consideration of language descriptions containing detailed attributes and spatial relationships. To address this limitation, we propose VLN-Game, a novel zero-shot framework for visual target navigation that can process object names and descriptive language targets effectively. To be more precise, our approach constructs a 3D object-centric spatial map by integrating pre-trained visual-language features with a 3D reconstruction of the physical environment. Then, the framework identifies the most promising areas to explore in search of potential target candidates. A game-theoretic vision language model is employed to determine which target best matches the given language description. Experiments conducted on the Habitat-Matterport 3D (HM3D) dataset demonstrate that the proposed framework achieves state-of-the-art performance in both object goal navigation and language-based navigation tasks. Moreover, we show that VLN-Game can be easily deployed on real-world robots. The success of VLN-Game highlights the promising potential of using game-theoretic methods with compact vision-language models to advance decision-making capabilities in robotic systems. The supplementary video and code can be accessed via the following link: https://sites.google.com/view/vln-game.

VLN-Game: Vision-Language Equilibrium Search for Zero-Shot Semantic Navigation

TL;DR

Abstract

VLN-Game: Vision-Language Equilibrium Search for Zero-Shot Semantic Navigation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)