TriHelper: Zero-Shot Object Navigation with Dynamic Assistance
Lingfeng Zhang, Qiang Zhang, Hao Wang, Erjia Xiao, Zixuan Jiang, Honglei Chen, Renjing Xu
TL;DR
TriHelper tackles zero-shot object navigation by integrating three dynamic assistance modules—Collision Helper, Exploration Helper, and Detection Helper—within a semantic-frontier navigation framework. The method builds semantic and frontier maps, uses an LLM to select exploration targets, and employs a VLM to verify target detections, with a Fast Marching Method-based local planner for real-time navigation. Ablation and cross-dataset experiments on HM3D and Gibson demonstrate state-of-the-art SR and competitive SPL, validating the effectiveness of targeted, modular assistance in unknown environments. The work highlights the importance of adaptive guidance to address collision, exploration efficiency, and misidentification, paving the way for more robust, deployable embodied AI in indoor navigation tasks.
Abstract
Navigating toward specific objects in unknown environments without additional training, known as Zero-Shot object navigation, poses a significant challenge in the field of robotics, which demands high levels of auxiliary information and strategic planning. Traditional works have focused on holistic solutions, overlooking the specific challenges agents encounter during navigation such as collision, low exploration efficiency, and misidentification of targets. To address these challenges, our work proposes TriHelper, a novel framework designed to assist agents dynamically through three primary navigation challenges: collision, exploration, and detection. Specifically, our framework consists of three innovative components: (i) Collision Helper, (ii) Exploration Helper, and (iii) Detection Helper. These components work collaboratively to solve these challenges throughout the navigation process. Experiments on the Habitat-Matterport 3D (HM3D) and Gibson datasets demonstrate that TriHelper significantly outperforms all existing baseline methods in Zero-Shot object navigation, showcasing superior success rates and exploration efficiency. Our ablation studies further underscore the effectiveness of each helper in addressing their respective challenges, notably enhancing the agent's navigation capabilities. By proposing TriHelper, we offer a fresh perspective on advancing the object navigation task, paving the way for future research in the domain of Embodied AI and visual-based navigation.
