Hierarchical Reinforcement Learning for Safe Mapless Navigation with Congestion Estimation
Jianqi Gao, Xizheng Pang, Qi Liu, Yanjie Li
TL;DR
This work tackles mapless indoor navigation in the presence of local minima by introducing a hierarchical reinforcement learning framework. The high-level policy generates congestion-aware sub-goals to steer navigation, while a safe low-level policy converts sub-goals into real-time motion commands, aided by an LOMap-based obstacle encoding. Key contributions include the environment congestion-based sub-goal update, an obstacle encoding strategy for motion planning, and the separation of high- and low-level training with safety guarantees via CPO; extensive simulations in office, home, and restaurant settings, plus real-world validation on a TurtleBot3, demonstrate strong generalization and practical viability. The approach offers robust performance in static and dynamic environments and provides a scalable blueprint for safe, mapless navigation in unstructured indoor spaces.
Abstract
Reinforcement learning-based mapless navigation holds significant potential. However, it faces challenges in indoor environments with local minima area. This paper introduces a safe mapless navigation framework utilizing hierarchical reinforcement learning (HRL) to enhance navigation through such areas. The high-level policy creates a sub-goal to direct the navigation process. Notably, we have developed a sub-goal update mechanism that considers environment congestion, efficiently avoiding the entrapment of the robot in local minimum areas. The low-level motion planning policy, trained through safe reinforcement learning, outputs real-time control instructions based on acquired sub-goal. Specifically, to enhance the robot's environmental perception, we introduce a new obstacle encoding method that evaluates the impact of obstacles on the robot's motion planning. To validate the performance of our HRL-based navigation framework, we conduct simulations in office, home, and restaurant environments. The findings demonstrate that our HRL-based navigation framework excels in both static and dynamic scenarios. Finally, we implement the HRL-based navigation framework on a TurtleBot3 robot for physical validation experiments, which exhibits its strong generalization capabilities.
