Table of Contents
Fetching ...

Enhancing Exploratory Capability of Visual Navigation Using Uncertainty of Implicit Scene Representation

Yichen Wang, Qiming Liu, Zhe Liu, Hesheng Wang

TL;DR

This work proposes the Navigation with Uncertainty-driven Exploration (NUE) pipeline, which uses an implicit and compact scene representation, NeRF, as a cognitive structure, and estimates the uncertainty of NeRF and augment the exploratory ability by the uncertainty to in turn facilitate the construction of implicit representation.

Abstract

In the context of visual navigation in unknown scenes, both "exploration" and "exploitation" are equally crucial. Robots must first establish environmental cognition through exploration and then utilize the cognitive information to accomplish target searches. However, most existing methods for image-goal navigation prioritize target search over the generation of exploratory behavior. To address this, we propose the Navigation with Uncertainty-driven Exploration (NUE) pipeline, which uses an implicit and compact scene representation, NeRF, as a cognitive structure. We estimate the uncertainty of NeRF and augment the exploratory ability by the uncertainty to in turn facilitate the construction of implicit representation. Simultaneously, we extract memory information from NeRF to enhance the robot's reasoning ability for determining the location of the target. Ultimately, we seamlessly combine the two generated abilities to produce navigational actions. Our pipeline is end-to-end, with the environmental cognitive structure being constructed online. Extensive experimental results on image-goal navigation demonstrate the capability of our pipeline to enhance exploratory behaviors, while also enabling a natural transition from the exploration to exploitation phase. This enables our model to outperform existing memory-based cognitive navigation structures in terms of navigation performance.

Enhancing Exploratory Capability of Visual Navigation Using Uncertainty of Implicit Scene Representation

TL;DR

This work proposes the Navigation with Uncertainty-driven Exploration (NUE) pipeline, which uses an implicit and compact scene representation, NeRF, as a cognitive structure, and estimates the uncertainty of NeRF and augment the exploratory ability by the uncertainty to in turn facilitate the construction of implicit representation.

Abstract

In the context of visual navigation in unknown scenes, both "exploration" and "exploitation" are equally crucial. Robots must first establish environmental cognition through exploration and then utilize the cognitive information to accomplish target searches. However, most existing methods for image-goal navigation prioritize target search over the generation of exploratory behavior. To address this, we propose the Navigation with Uncertainty-driven Exploration (NUE) pipeline, which uses an implicit and compact scene representation, NeRF, as a cognitive structure. We estimate the uncertainty of NeRF and augment the exploratory ability by the uncertainty to in turn facilitate the construction of implicit representation. Simultaneously, we extract memory information from NeRF to enhance the robot's reasoning ability for determining the location of the target. Ultimately, we seamlessly combine the two generated abilities to produce navigational actions. Our pipeline is end-to-end, with the environmental cognitive structure being constructed online. Extensive experimental results on image-goal navigation demonstrate the capability of our pipeline to enhance exploratory behaviors, while also enabling a natural transition from the exploration to exploitation phase. This enables our model to outperform existing memory-based cognitive navigation structures in terms of navigation performance.

Paper Structure

This paper contains 19 sections, 12 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Navigation of robots in an unknown environment. In visual navigation of robots in unknown environments, the process involves two phases: exploration and exploitation. Initially, the robot explores based on uncertainty to refine its cognitive structure, transitioning to navigation toward detected target-related cues in the environment.
  • Figure 2: The overall architecture of NUE. Firstly, real-time image input is used for online cognitive generation and perceptual feature extraction. Secondly, cognitive information is extracted to generate exploratory thinking and exploitative thinking. Eventually, multiple thinking is integrated, and navigational actions are generated.
  • Figure 3: The network structure of NUE. The overall framework first extracts real-time perceptual features and generates cognitive signals in NeRF. Subsequently, we compress the uncertainty map to generate uncertainty features and concatenate the spatial feature map with the target image in the channel dimension for spatial feature extraction. Finally, the features are concatenated and fed into an adaptive neural controller to generate the final navigation actions.
  • Figure 4: Visualization examples of image-goal navigation. The visualized results showcase the behavioral logic of our model. Our model successfully utilizes uncertainty to explore the scene in the early stages of navigation, while also effectively leveraging cognitive information for navigation upon sighting the target.
  • Figure 5: Predicted results of the auxiliary task. The change in arrow color from yellow to red indicates the progress of navigation. During navigation tasks, auxiliary task accuracy steadily enhances, especially after observing the target.
  • ...and 1 more figures