Table of Contents
Fetching ...

SCALE: Self-Correcting Visual Navigation for Mobile Robots via Anti-Novelty Estimation

Chang Chen, Yuecheng Liu, Yuzheng Zhuang, Sitong Mao, Kaiwen Xue, Shunbo Zhou

TL;DR

SCALE tackles robust real-world visual navigation under offline learning by addressing OOD and localization failures. It combines image-goal navigation learned via offline Implicit Q-Learning with a self-supervised localization recovery module that imagines multi-step trajectories through a conditional affordance model, guided by anti-novelty via Random Network Distillation. The approach introduces a temporally informed prediction (GRU-based) to enable aggressive subgoal estimation and uses MPPI to optimize trajectories under a constrained cost that penalizes novelty and facilitates localization. Experiments in three outdoor urban scenarios demonstrate that SCALE with localization recovery significantly outperforms state-of-the-art baselines, reducing the need for human intervention and improving robustness to scenario changes. The work offers a practical path toward robust, GPS-denied navigation for mobile robots using only forward-facing vision and offline data.

Abstract

Although visual navigation has been extensively studied using deep reinforcement learning, online learning for real-world robots remains a challenging task. Recent work directly learned from offline dataset to achieve broader generalization in the real-world tasks, which, however, faces the out-of-distribution (OOD) issue and potential robot localization failures in a given map for unseen observation. This significantly drops the success rates and even induces collision. In this paper, we present a self-correcting visual navigation method, SCALE, that can autonomously prevent the robot from the OOD situations without human intervention. Specifically, we develop an image-goal conditioned offline reinforcement learning method based on implicit Q-learning (IQL). When facing OOD observation, our novel localization recovery method generates the potential future trajectories by learning from the navigation affordance, and estimates the future novelty via random network distillation (RND). A tailored cost function searches for the candidates with the least novelty that can lead the robot to the familiar places. We collect offline data and conduct evaluation experiments in three real-world urban scenarios. Experiment results show that SCALE outperforms the previous state-of-the-art methods for open-world navigation with a unique capability of localization recovery, significantly reducing the need for human intervention. Code is available at https://github.com/KubeEdge4Robotics/ScaleNav.

SCALE: Self-Correcting Visual Navigation for Mobile Robots via Anti-Novelty Estimation

TL;DR

SCALE tackles robust real-world visual navigation under offline learning by addressing OOD and localization failures. It combines image-goal navigation learned via offline Implicit Q-Learning with a self-supervised localization recovery module that imagines multi-step trajectories through a conditional affordance model, guided by anti-novelty via Random Network Distillation. The approach introduces a temporally informed prediction (GRU-based) to enable aggressive subgoal estimation and uses MPPI to optimize trajectories under a constrained cost that penalizes novelty and facilitates localization. Experiments in three outdoor urban scenarios demonstrate that SCALE with localization recovery significantly outperforms state-of-the-art baselines, reducing the need for human intervention and improving robustness to scenario changes. The work offers a practical path toward robust, GPS-denied navigation for mobile robots using only forward-facing vision and offline data.

Abstract

Although visual navigation has been extensively studied using deep reinforcement learning, online learning for real-world robots remains a challenging task. Recent work directly learned from offline dataset to achieve broader generalization in the real-world tasks, which, however, faces the out-of-distribution (OOD) issue and potential robot localization failures in a given map for unseen observation. This significantly drops the success rates and even induces collision. In this paper, we present a self-correcting visual navigation method, SCALE, that can autonomously prevent the robot from the OOD situations without human intervention. Specifically, we develop an image-goal conditioned offline reinforcement learning method based on implicit Q-learning (IQL). When facing OOD observation, our novel localization recovery method generates the potential future trajectories by learning from the navigation affordance, and estimates the future novelty via random network distillation (RND). A tailored cost function searches for the candidates with the least novelty that can lead the robot to the familiar places. We collect offline data and conduct evaluation experiments in three real-world urban scenarios. Experiment results show that SCALE outperforms the previous state-of-the-art methods for open-world navigation with a unique capability of localization recovery, significantly reducing the need for human intervention. Code is available at https://github.com/KubeEdge4Robotics/ScaleNav.
Paper Structure (15 sections, 9 equations, 5 figures, 2 tables, 2 algorithms)

This paper contains 15 sections, 9 equations, 5 figures, 2 tables, 2 algorithms.

Figures (5)

  • Figure 1: Localization challenges. For the offline learning methods, the out-of-distribution issue frequently arises in the real-world navigation, making the robot lose its localization in the given map that decreases the success rates (a) and even causes collision (b). When the scenario changes that some novel obstacles that are not in the offline dataset appear during deploying, the policy may also become unavailable (c). Another case is when the robot is placed at an unknown position in the spacious deploying environments (d). A novel localization recovery module can tackle these four typical cases by predicting future states and assessing their novelty.
  • Figure 2: Overall framework. (a) SCALE first pretrains a self-consistent representation space $z$ by the VAE-style loss, then fine-tunes it by the gradients from $Q (s, a, g)$ in the IQL. Next, we train the policy network $\pi$, temporal affordance model $\psi$ and the novelty predictor $f_\omega$ over the trained representation space. (b) When the robot gets lost, SCALE randomly samples some transition $u$ from the prior $p(u)$ and feds it to the temporal affordance model to generate some multi-step latent trajectories recursively. Then it evaluates the candidates in terms of the reachability, anti-novelty and aggressiveness. Finally, it selects the optimal trajectory and executes the first step, then repeats until being successfully localized again.
  • Figure 3: Topological navigation with localization recovery. SCALE combines the topological visual navigation with a novel localization recovery module. We first build a topological map (gray cycles and lines) based on the offline dataset. Next, starting at the yellow cycle, we use the localization module to do active initialization. Then, given a goal image, we search a route (orange cycles) on the topological graph and execute to the goal (red cycle) step by step. The cyan lines denote the actual trajectories. The plot panels show the plans with and without RND for the active initialization and localization recovery (purple cycles) during navigation. Ultimately, the optimal trajectories (red lines) guide the robot to relocalize itself.
  • Figure 4: Quantitative experiments. We evaluate SCALE in three outdoor environments, which are shown in the satellite images (1st column) and the cyan lines indicate the navigation routes. The 2nd column shows some waypoints before the localization failures. When the actual trajectories (cyan lines) distinctly deviate from the topological map built on the offline trajectories (green lines), the localization failures arises (3rd column). In this case, our localization recovery module generates some latent subgoals and evaluates them by cost function. The optimal anti-novelty plan (red lines) is executed to correct the robot's trajectory (4th column), eventually navigating the robot to the goal (5th column) without human intervention.
  • Figure 5: Performance demonstration. Only SCALE equipped with localization recovery successfully reaches the designated goal, exhibiting strong robustness to the trajectory deviation induced by cumulative driving error (a), and the sharp turns through aggressive state prediction (b). SCALE uniquely succeeds in attaining the goal against the scenario changes (c) and active initialization in an unknown place in the spacious environments (d), which other three methods cannot handle.