NavG: Risk-Aware Navigation in Crowded Environments Based on Reinforcement Learning with Guidance Points
Qianyi Zhang, Wentao Luo, Boyi Liu, Ziyang Zhang, Yaoyuan Wang, Jingtai Liu
TL;DR
NavG addresses perceptual errors in robot navigation by introducing guidance points as directional cues within an RL framework. It couples a principled identification of guidance points, a perception-to-planning mapping that fuses sparse laser data and human detections, and an SAC-based navigation policy that optimizes progress toward a goal while maintaining safety, where the state is $\mathcal{S}_t=[\mathcal{S}_t^o,\mathcal{S}_t^e]$ and the action is $\mathbf{u}_{t+1}=(v_{t+1},\phi_{t+1})$, and the reward is $r_t = w_1 v_{\parallel} - w_2 |\phi_t| + w_3 \cdot \text{(safety term)}$, with $v_{\parallel}=\mathbf{v}\cdot\hat{\mathbf{p}}_{goal}$. The approach uses a robot-centered polar representation, LSTM-based pedestrian aggregation, and imitation learning to stabilize training, reporting superior success rates and near-optimal travel times in simulation and real-world corridors and lobbies. The results demonstrate robust operation in crowded environments despite detection errors, offering a practical pathway to safer and more efficient autonomous navigation in human-rich settings.
Abstract
Motion planning in navigation systems is highly susceptible to upstream perceptual errors, particularly in human detection and tracking. To mitigate this issue, the concept of guidance points--a novel directional cue within a reinforcement learning-based framework--is introduced. A structured method for identifying guidance points is developed, consisting of obstacle boundary extraction, potential guidance point detection, and redundancy elimination. To integrate guidance points into the navigation pipeline, a perception-to-planning mapping strategy is proposed, unifying guidance points with other perceptual inputs and enabling the RL agent to effectively leverage the complementary relationships among raw laser data, human detection and tracking, and guidance points. Qualitative and quantitative simulations demonstrate that the proposed approach achieves the highest success rate and near-optimal travel times, greatly improving both safety and efficiency. Furthermore, real-world experiments in dynamic corridors and lobbies validate the robot's ability to confidently navigate around obstacles and robustly avoid pedestrians.
