Table of Contents
Fetching ...

Deep Reinforcement Learning Based Navigation with Macro Actions and Topological Maps

Simon Hakenes, Tobias Glasmachers

TL;DR

This work tackles navigation in large, visually complex environments with sparse rewards by introducing object-centric topological maps and macro actions that guide a simple Deep Q-Network (DQN). The agent builds a growing graph $G_t=(V_t,E_t)$ of object nodes, each containing multi-view patches and 3D coordinates, and selects macro actions that navigate to target nodes using A* paths with a low-level controller. A modified DQN evaluates actions across a dynamically expanding set of nodes by processing per-node visual patches with shared CNN branches and merging with a progress vector $x_t \in \{0,1\}^{N_T}$ via an outer product to produce $Q(f_t,x_t)$. Experiments in 100 photorealistic Habitat Matterport3D scenes show the approach outperforms random baselines under both immediate and terminal rewards, demonstrating sample-efficient learning enabled by topological structure and macro-action abstraction, with implications for more realistic perception integration such as SLAM.

Abstract

This paper addresses the challenge of navigation in large, visually complex environments with sparse rewards. We propose a method that uses object-oriented macro actions grounded in a topological map, allowing a simple Deep Q-Network (DQN) to learn effective navigation policies. The agent builds a map by detecting objects from RGBD input and selecting discrete macro actions that correspond to navigating to these objects. This abstraction drastically reduces the complexity of the underlying reinforcement learning problem and enables generalization to unseen environments. We evaluate our approach in a photorealistic 3D simulation and show that it significantly outperforms a random baseline under both immediate and terminal reward conditions. Our results demonstrate that topological structure and macro-level abstraction can enable sample-efficient learning even from pixel data.

Deep Reinforcement Learning Based Navigation with Macro Actions and Topological Maps

TL;DR

This work tackles navigation in large, visually complex environments with sparse rewards by introducing object-centric topological maps and macro actions that guide a simple Deep Q-Network (DQN). The agent builds a growing graph of object nodes, each containing multi-view patches and 3D coordinates, and selects macro actions that navigate to target nodes using A* paths with a low-level controller. A modified DQN evaluates actions across a dynamically expanding set of nodes by processing per-node visual patches with shared CNN branches and merging with a progress vector via an outer product to produce . Experiments in 100 photorealistic Habitat Matterport3D scenes show the approach outperforms random baselines under both immediate and terminal rewards, demonstrating sample-efficient learning enabled by topological structure and macro-action abstraction, with implications for more realistic perception integration such as SLAM.

Abstract

This paper addresses the challenge of navigation in large, visually complex environments with sparse rewards. We propose a method that uses object-oriented macro actions grounded in a topological map, allowing a simple Deep Q-Network (DQN) to learn effective navigation policies. The agent builds a map by detecting objects from RGBD input and selecting discrete macro actions that correspond to navigating to these objects. This abstraction drastically reduces the complexity of the underlying reinforcement learning problem and enables generalization to unseen environments. We evaluate our approach in a photorealistic 3D simulation and show that it significantly outperforms a random baseline under both immediate and terminal reward conditions. Our results demonstrate that topological structure and macro-level abstraction can enable sample-efficient learning even from pixel data.

Paper Structure

This paper contains 12 sections, 2 equations, 6 figures.

Figures (6)

  • Figure 1: Example of a topological map built by the agent. Object nodes are visualized using one representative image per object, while waypoints are shown as blue dots. Edges indicate navigable connections. See Section \ref{['sec:map']} for a more detailed description.
  • Figure 2: Overview of the system.
  • Figure 3: Screenshots of the environments with different target objects. In (a)–(c), the targets are color-coded cylinders, while in (d) the target is a rocking horse, in (e) a basketball, and in (f) a suitcase. Each environment consists of multiple rooms.
  • Figure 4: A set of input patches of an object. Note how in some perspectives the object is occluded by the yellow target cylinders.
  • Figure 5: Network architecture, where $\otimes$ denotes an outer product. The input image represents a sample RGB observation fed into the network. The convolutional layers share the same weights.
  • ...and 1 more figures