Deep Reinforcement Learning Based Navigation with Macro Actions and Topological Maps
Simon Hakenes, Tobias Glasmachers
TL;DR
This work tackles navigation in large, visually complex environments with sparse rewards by introducing object-centric topological maps and macro actions that guide a simple Deep Q-Network (DQN). The agent builds a growing graph $G_t=(V_t,E_t)$ of object nodes, each containing multi-view patches and 3D coordinates, and selects macro actions that navigate to target nodes using A* paths with a low-level controller. A modified DQN evaluates actions across a dynamically expanding set of nodes by processing per-node visual patches with shared CNN branches and merging with a progress vector $x_t \in \{0,1\}^{N_T}$ via an outer product to produce $Q(f_t,x_t)$. Experiments in 100 photorealistic Habitat Matterport3D scenes show the approach outperforms random baselines under both immediate and terminal rewards, demonstrating sample-efficient learning enabled by topological structure and macro-action abstraction, with implications for more realistic perception integration such as SLAM.
Abstract
This paper addresses the challenge of navigation in large, visually complex environments with sparse rewards. We propose a method that uses object-oriented macro actions grounded in a topological map, allowing a simple Deep Q-Network (DQN) to learn effective navigation policies. The agent builds a map by detecting objects from RGBD input and selecting discrete macro actions that correspond to navigating to these objects. This abstraction drastically reduces the complexity of the underlying reinforcement learning problem and enables generalization to unseen environments. We evaluate our approach in a photorealistic 3D simulation and show that it significantly outperforms a random baseline under both immediate and terminal reward conditions. Our results demonstrate that topological structure and macro-level abstraction can enable sample-efficient learning even from pixel data.
