Table of Contents
Fetching ...

Online Hierarchical Policy Learning using Physics Priors for Robot Navigation in Unknown Environments

Wei Han Chen, Yuchen Liu, Alexiy Buynitsky, Ahmed H. Qureshi

TL;DR

This work addresses navigation in large, unknown indoor environments by integrating physics-informed neural time fields within a hierarchical framework. It introduces Modular-NTFields (mNTFields), which combines an online sparse high-level navigation graph with localized neural Eikonal PDE solvers (low-level subnetworks) to efficiently compute cost-to-go maps while mitigating spectral bias and forgetting. The approach employs online room segmentation, modular subnetworks, adaptive sampling, and a TD-based training objective to enable fast, collision-free planning that scales to complex spaces, demonstrated through simulated and real-world robot experiments. The results show faster mapping, higher planning success, and robust real-world deployment, highlighting the method's potential for online exploration, mapping, and navigation in unknown environments.

Abstract

Robot navigation in large, complex, and unknown indoor environments is a challenging problem. The existing approaches, such as traditional sampling-based methods, struggle with resolution control and scalability, while imitation learning-based methods require a large amount of demonstration data. Active Neural Time Fields (ANTFields) have recently emerged as a promising solution by using local observations to learn cost-to-go functions without relying on demonstrations. Despite their potential, these methods are hampered by challenges such as spectral bias and catastrophic forgetting, which diminish their effectiveness in complex scenarios. To address these issues, our approach decomposes the planning problem into a hierarchical structure. At the high level, a sparse graph captures the environment's global connectivity, while at the low level, a planner based on neural fields navigates local obstacles by solving the Eikonal PDE. This physics-informed strategy overcomes common pitfalls like spectral bias and neural field fitting difficulties, resulting in a smooth and precise representation of the cost landscape. We validate our framework in large-scale environments, demonstrating its enhanced adaptability and precision compared to previous methods, and highlighting its potential for online exploration, mapping, and real-world navigation.

Online Hierarchical Policy Learning using Physics Priors for Robot Navigation in Unknown Environments

TL;DR

This work addresses navigation in large, unknown indoor environments by integrating physics-informed neural time fields within a hierarchical framework. It introduces Modular-NTFields (mNTFields), which combines an online sparse high-level navigation graph with localized neural Eikonal PDE solvers (low-level subnetworks) to efficiently compute cost-to-go maps while mitigating spectral bias and forgetting. The approach employs online room segmentation, modular subnetworks, adaptive sampling, and a TD-based training objective to enable fast, collision-free planning that scales to complex spaces, demonstrated through simulated and real-world robot experiments. The results show faster mapping, higher planning success, and robust real-world deployment, highlighting the method's potential for online exploration, mapping, and navigation in unknown environments.

Abstract

Robot navigation in large, complex, and unknown indoor environments is a challenging problem. The existing approaches, such as traditional sampling-based methods, struggle with resolution control and scalability, while imitation learning-based methods require a large amount of demonstration data. Active Neural Time Fields (ANTFields) have recently emerged as a promising solution by using local observations to learn cost-to-go functions without relying on demonstrations. Despite their potential, these methods are hampered by challenges such as spectral bias and catastrophic forgetting, which diminish their effectiveness in complex scenarios. To address these issues, our approach decomposes the planning problem into a hierarchical structure. At the high level, a sparse graph captures the environment's global connectivity, while at the low level, a planner based on neural fields navigates local obstacles by solving the Eikonal PDE. This physics-informed strategy overcomes common pitfalls like spectral bias and neural field fitting difficulties, resulting in a smooth and precise representation of the cost landscape. We validate our framework in large-scale environments, demonstrating its enhanced adaptability and precision compared to previous methods, and highlighting its potential for online exploration, mapping, and real-world navigation.

Paper Structure

This paper contains 17 sections, 6 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: We propose mNTFields, a modular neural learning framework for scalable motion planning. Our pipeline constructs a navigation graph online during the exploration phase, which can then be leveraged for long-horizon path planning. The exploration phase begins with processing a local depth observation to build a global occupancy map. Then, room segmentation is performed to create new nodes in the navigation graph, where each node corresponds to a modular subnetwork. These subnetworks are trained using the normalized observation data. Finally, a path is planned towards the next best viewpoint to facilitate further exploration. During the path planning phase, a graph search is performed on the navigation graph. The corresponding subnetworks are queried to generate path segments, which are then concatenated to construct a full long-horizon path.
  • Figure 2: In the online exploration phase, new rooms are discovered with room segmentation. The entry points are identified and added to the graph, shown as the nodes in the figure. Entry points in the same room are also interconnected. To showcase the modular nature of our method, the travel time fields (cyan contour lines) are generated by separate networks. During exploration, we can use this graph along with the corresponding subnetworks to plan long horizon task with more robustness. The red dot shows the robot's current location, and the red lines shows the predicted trajectory to reach the next waypoint.
  • Figure 3: Depiction of two Gibson environments: The paths generated by all methods are shown between the given start and goal. Our method successfully planned a collision-free smooth path in around 0.06 seconds, showcasing its ability to be deployed into complex indoor environments. SMP methods RRTConnect (Green), and Lazy-PRM (Cyan) took around 3 seconds to find the path, however with sharper turns. FMM (Purple) takes around 0.8 seconds to complete, but retrieved a longer path length due to limited discretization resolution. MPOT (Red) failed to retrieve a path in (b) due to correct path requiring multiple turns.