Table of Contents
Fetching ...

NavG: Risk-Aware Navigation in Crowded Environments Based on Reinforcement Learning with Guidance Points

Qianyi Zhang, Wentao Luo, Boyi Liu, Ziyang Zhang, Yaoyuan Wang, Jingtai Liu

TL;DR

NavG addresses perceptual errors in robot navigation by introducing guidance points as directional cues within an RL framework. It couples a principled identification of guidance points, a perception-to-planning mapping that fuses sparse laser data and human detections, and an SAC-based navigation policy that optimizes progress toward a goal while maintaining safety, where the state is $\mathcal{S}_t=[\mathcal{S}_t^o,\mathcal{S}_t^e]$ and the action is $\mathbf{u}_{t+1}=(v_{t+1},\phi_{t+1})$, and the reward is $r_t = w_1 v_{\parallel} - w_2 |\phi_t| + w_3 \cdot \text{(safety term)}$, with $v_{\parallel}=\mathbf{v}\cdot\hat{\mathbf{p}}_{goal}$. The approach uses a robot-centered polar representation, LSTM-based pedestrian aggregation, and imitation learning to stabilize training, reporting superior success rates and near-optimal travel times in simulation and real-world corridors and lobbies. The results demonstrate robust operation in crowded environments despite detection errors, offering a practical pathway to safer and more efficient autonomous navigation in human-rich settings.

Abstract

Motion planning in navigation systems is highly susceptible to upstream perceptual errors, particularly in human detection and tracking. To mitigate this issue, the concept of guidance points--a novel directional cue within a reinforcement learning-based framework--is introduced. A structured method for identifying guidance points is developed, consisting of obstacle boundary extraction, potential guidance point detection, and redundancy elimination. To integrate guidance points into the navigation pipeline, a perception-to-planning mapping strategy is proposed, unifying guidance points with other perceptual inputs and enabling the RL agent to effectively leverage the complementary relationships among raw laser data, human detection and tracking, and guidance points. Qualitative and quantitative simulations demonstrate that the proposed approach achieves the highest success rate and near-optimal travel times, greatly improving both safety and efficiency. Furthermore, real-world experiments in dynamic corridors and lobbies validate the robot's ability to confidently navigate around obstacles and robustly avoid pedestrians.

NavG: Risk-Aware Navigation in Crowded Environments Based on Reinforcement Learning with Guidance Points

TL;DR

NavG addresses perceptual errors in robot navigation by introducing guidance points as directional cues within an RL framework. It couples a principled identification of guidance points, a perception-to-planning mapping that fuses sparse laser data and human detections, and an SAC-based navigation policy that optimizes progress toward a goal while maintaining safety, where the state is and the action is , and the reward is , with . The approach uses a robot-centered polar representation, LSTM-based pedestrian aggregation, and imitation learning to stabilize training, reporting superior success rates and near-optimal travel times in simulation and real-world corridors and lobbies. The results demonstrate robust operation in crowded environments despite detection errors, offering a practical pathway to safer and more efficient autonomous navigation in human-rich settings.

Abstract

Motion planning in navigation systems is highly susceptible to upstream perceptual errors, particularly in human detection and tracking. To mitigate this issue, the concept of guidance points--a novel directional cue within a reinforcement learning-based framework--is introduced. A structured method for identifying guidance points is developed, consisting of obstacle boundary extraction, potential guidance point detection, and redundancy elimination. To integrate guidance points into the navigation pipeline, a perception-to-planning mapping strategy is proposed, unifying guidance points with other perceptual inputs and enabling the RL agent to effectively leverage the complementary relationships among raw laser data, human detection and tracking, and guidance points. Qualitative and quantitative simulations demonstrate that the proposed approach achieves the highest success rate and near-optimal travel times, greatly improving both safety and efficiency. Furthermore, real-world experiments in dynamic corridors and lobbies validate the robot's ability to confidently navigate around obstacles and robustly avoid pedestrians.

Paper Structure

This paper contains 11 sections, 11 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: Issues in human detection. (a) The human is either not detected or detected with a velocity estimation error. (b) A non-human object is mistakenly classified as a human. Visit this https://navg-dev.github.io for the performance of the proposed NavG.
  • Figure 2: Framework of the proposed NavG. Raw sensor data is processed to extract sparse laser data, human states, and guidance points, which indicate potential directions for the robot. These elements are unified in polar coordinates centered on the robot. Along with the robot’s goal and historical states, all inputs are fed into the neural network to generate actions.
  • Figure 3: Illustration of guidance point identification. (a) Given an obstacle map, an erosion operation and depth-first search are applied to obtain (b) the eroded map and (c) boundary points. (d-e) Guidance points are the midpoints of boundary point pairs from different groups with the smallest distance.
  • Figure 4: Illustration of eliminating the misleading guidance point $g_{ij}$ which may dangerously and redundantly guide the robot too close to $\mathcal{B}_j$. Removing it ensures efficient guidance to $g_{ij}$ or $g_{jk}$.
  • Figure 5: Illustration of projecting various data types of different lengths onto polar coordinates centered on the robot. The values are either positively or negatively correlated with their distances to the robot, ensuring consistency in how data magnitudes influence safety. Figures (a, b, c) share the relation $d_i > d_j > d_k > d_l$, indicating that (a) closer guidance points contribute more to safety, and (b-c) distant obstacles or pedestrians have a lower impact on collision risk for the current action. Their different combinations yield distinct benefits, which will be further analyzed in the simulation section.
  • ...and 5 more figures