Table of Contents
Fetching ...

Safe-VLN: Collision Avoidance for Vision-and-Language Navigation of Autonomous Robots Operating in Continuous Environments

Lu Yue, Dongliang Zhou, Liang Xie, Feitian Zhang, Ye Yan, Erwei Yin

TL;DR

Safe-VLN tackles collision-prone navigation in vision-and-language navigation within continuous environments (VLN-CE) by classifying collisions into waypoint, navigation, and dynamic types and by introducing two modules: an occupancy-mask guided waypoint predictor and a re-selection navigator. The occupancy mask uses a simulated 2D LiDAR occupancy map to suppress obstacle-inhabited candidate waypoints, while the re-selection navigator maintains a navigable graph and can re-sample alternative subgoals when collisions occur. Empirical results on the R2R-CE benchmark show improved SR and reduced collision rates compared with state-of-the-art VLN-CE baselines, and ablations validate the effectiveness of the occupancy mask and re-selection strategy. The work advances practical deployment of VLN-CE by enhancing collision avoidance and robustness to dynamic obstacles, with implications for real-world autonomous robots.

Abstract

The task of vision-and-language navigation in continuous environments (VLN-CE) aims at training an autonomous agent to perform low-level actions to navigate through 3D continuous surroundings using visual observations and language instructions. The significant potential of VLN-CE for mobile robots has been demonstrated across a large number of studies. However, most existing works in VLN-CE focus primarily on transferring the standard discrete vision-and-language navigation (VLN) methods to continuous environments, overlooking the problem of collisions. Such oversight often results in the agent deviating from the planned path or, in severe instances, the agent being trapped in obstacle areas and failing the navigational task. To address the above-mentioned issues, this paper investigates various collision scenarios within VLN-CE and proposes a classification method to predicate the underlying causes of collisions. Furthermore, a new VLN-CE algorithm, named Safe-VLN, is proposed to bolster collision avoidance capabilities including two key components, i.e., a waypoint predictor and a navigator. In particular, the waypoint predictor leverages a simulated 2D LiDAR occupancy mask to prevent the predicted waypoints from being situated in obstacle-ridden areas. The navigator, on the other hand, employs the strategy of `re-selection after collision' to prevent the robot agent from becoming ensnared in a cycle of perpetual collisions. The proposed Safe-VLN is evaluated on the R2R-CE, the results of which demonstrate an enhanced navigational performance and a statistically significant reduction in collision incidences.

Safe-VLN: Collision Avoidance for Vision-and-Language Navigation of Autonomous Robots Operating in Continuous Environments

TL;DR

Safe-VLN tackles collision-prone navigation in vision-and-language navigation within continuous environments (VLN-CE) by classifying collisions into waypoint, navigation, and dynamic types and by introducing two modules: an occupancy-mask guided waypoint predictor and a re-selection navigator. The occupancy mask uses a simulated 2D LiDAR occupancy map to suppress obstacle-inhabited candidate waypoints, while the re-selection navigator maintains a navigable graph and can re-sample alternative subgoals when collisions occur. Empirical results on the R2R-CE benchmark show improved SR and reduced collision rates compared with state-of-the-art VLN-CE baselines, and ablations validate the effectiveness of the occupancy mask and re-selection strategy. The work advances practical deployment of VLN-CE by enhancing collision avoidance and robustness to dynamic obstacles, with implications for real-world autonomous robots.

Abstract

The task of vision-and-language navigation in continuous environments (VLN-CE) aims at training an autonomous agent to perform low-level actions to navigate through 3D continuous surroundings using visual observations and language instructions. The significant potential of VLN-CE for mobile robots has been demonstrated across a large number of studies. However, most existing works in VLN-CE focus primarily on transferring the standard discrete vision-and-language navigation (VLN) methods to continuous environments, overlooking the problem of collisions. Such oversight often results in the agent deviating from the planned path or, in severe instances, the agent being trapped in obstacle areas and failing the navigational task. To address the above-mentioned issues, this paper investigates various collision scenarios within VLN-CE and proposes a classification method to predicate the underlying causes of collisions. Furthermore, a new VLN-CE algorithm, named Safe-VLN, is proposed to bolster collision avoidance capabilities including two key components, i.e., a waypoint predictor and a navigator. In particular, the waypoint predictor leverages a simulated 2D LiDAR occupancy mask to prevent the predicted waypoints from being situated in obstacle-ridden areas. The navigator, on the other hand, employs the strategy of `re-selection after collision' to prevent the robot agent from becoming ensnared in a cycle of perpetual collisions. The proposed Safe-VLN is evaluated on the R2R-CE, the results of which demonstrate an enhanced navigational performance and a statistically significant reduction in collision incidences.
Paper Structure (12 sections, 6 equations, 5 figures, 5 tables)

This paper contains 12 sections, 6 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Collision scenarios in VLN-CE classified in this letter. The first type of collision refers to the predicted waypoint being in the obstacle zone; the second type of collision refers to encountering obstacles when navigating from the waypoint to the next chosen waypoint; and the third type of collision refers to dynamic obstacles that may occur at any time during the navigation process.
  • Figure 2: The overview of the proposed Safe-VLN. The Safe-VLN designs a waypoint predictor that generates candidate waypoints by combining the predicted heatmap with occupancy masks. In addition, the Safe-VLN adds a re-selection navigator in the navigation planner.
  • Figure 3: Illustration of occupancy masks designed in the waypoint predictor.
  • Figure 4: The experimental results of SR, SPL, N-C and W-C with varying parameter $\delta$.
  • Figure 5: Examples of re-selection navigator. When the chosen waypoint is non-navigable, the agent flexibly selects another proper waypoint.