Table of Contents
Fetching ...

RVN-Bench: A Benchmark for Reactive Visual Navigation

Jaewon Lee, Jaeseok Heo, Gunmin Lee, Howoong Jun, Jeongwoo Oh, Songhwai Oh

TL;DR

Experiments show that policies trained on RVN-Bench generalize effectively to unseen environments, demonstrating its value as a standardized benchmark for safe and robust visual navigation.

Abstract

Safe visual navigation is critical for indoor mobile robots operating in cluttered environments. Existing benchmarks, however, often neglect collisions or are designed for outdoor scenarios, making them unsuitable for indoor visual navigation. To address this limitation, we introduce the reactive visual navigation benchmark (RVN-Bench), a collision-aware benchmark for indoor mobile robots. In RVN-Bench, an agent must reach sequential goal positions in previously unseen environments using only visual observations and no prior map, while avoiding collisions. Built on the Habitat 2.0 simulator and leveraging high-fidelity HM3D scenes, RVN-Bench provides large-scale, diverse indoor environments, defines a collision-aware navigation task and evaluation metrics, and offers tools for standardized training and benchmarking. RVN-Bench supports both online and offline learning by offering an environment for online reinforcement learning, a trajectory image dataset generator, and tools for producing negative trajectory image datasets that capture collision events. Experiments show that policies trained on RVN-Bench generalize effectively to unseen environments, demonstrating its value as a standardized benchmark for safe and robust visual navigation. Code and additional materials are available at: https://rvn-bench.github.io/.

RVN-Bench: A Benchmark for Reactive Visual Navigation

TL;DR

Experiments show that policies trained on RVN-Bench generalize effectively to unseen environments, demonstrating its value as a standardized benchmark for safe and robust visual navigation.

Abstract

Safe visual navigation is critical for indoor mobile robots operating in cluttered environments. Existing benchmarks, however, often neglect collisions or are designed for outdoor scenarios, making them unsuitable for indoor visual navigation. To address this limitation, we introduce the reactive visual navigation benchmark (RVN-Bench), a collision-aware benchmark for indoor mobile robots. In RVN-Bench, an agent must reach sequential goal positions in previously unseen environments using only visual observations and no prior map, while avoiding collisions. Built on the Habitat 2.0 simulator and leveraging high-fidelity HM3D scenes, RVN-Bench provides large-scale, diverse indoor environments, defines a collision-aware navigation task and evaluation metrics, and offers tools for standardized training and benchmarking. RVN-Bench supports both online and offline learning by offering an environment for online reinforcement learning, a trajectory image dataset generator, and tools for producing negative trajectory image datasets that capture collision events. Experiments show that policies trained on RVN-Bench generalize effectively to unseen environments, demonstrating its value as a standardized benchmark for safe and robust visual navigation. Code and additional materials are available at: https://rvn-bench.github.io/.
Paper Structure (17 sections, 5 equations, 4 figures, 4 tables)

This paper contains 17 sections, 5 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Overview of the RVN-Bench. The benchmark is designed for indoor mobile robots and focuses on collision-aware visual navigation, where an agent must reach sequential goal positions using only visual observations while avoiding collisions. Built upon Habitat 2.0 and utilizing HM3D scenes, it provides high-quality visual observations, an RL environment for training and evaluation, and pipelines for collecting trajectory image datasets, including negative (collision-inducing) trajectories.
  • Figure 2: Trajectory image dataset generation process. We generate trajectory image datasets through the following steps. (a) Start and goal positions are randomly sampled in the scene. (b) The shortest path between these positions is computed using the ground truth occupancy map padded with a margin larger than the agent’s radius. (c) The agent follows this path using discrete actions. (d) If the agent reaches the goal position without collision, its positions, yaws, and image observations at each time step are recorded. For negative trajectory image dataset generation, the occupancy map is padded with a margin smaller than the agent radius to create path that is unsafe for the agent in step (b). If the agent collides with an obstacle at $t=t_c$, the history of its positions, yaws, and image observations from $t_i$ to $t_f$ is recorded, where $t_i=\max(0, t_c-k_\text{pre})$, $t_f=\min(T, t_c+k_\text{post})$, and $I_t$ is set to $I_{t_c} \text{for } t>t_c$.
  • Figure 3: Overview of the NoMaD-Neg framework. (a) Two NoMaD-PointGoal model, NoMaD$_e$ and NoMaD$_n$, are trained separately with expert and negative trajectory datasets. (b) NoMaD$_e$ and NoMaD$_n$ predict expert trajectories $a_{e,i} \in \textbf{a}_E$ and negative trajectories $a_{n,i} \in \textbf{a}_N$, respectively. The expert trajectory $a_e^*$ with minimal $\text{CoR}(a_{e,i}, \textbf{a}_E, \textbf{a}_N)$ is then selected.
  • Figure 4: Real-world evaluation of NoMaD-PointGoal with different training datasets. (a) Clearpath Jackal UGV platform clearpath_jackalstereolabs_zedxnvidia_jetson_orin used in the experiments. (b) Illustration of evaluation results in a house environment. Shown are the executed trajectories of the NoMaD-PointGoal agent trained on the real-world dataset (pink), trained on the simulation dataset (yellow), and trained on the combined real-world and simulation datasets (blue).