Table of Contents
Fetching ...

Learning to Navigate in Mazes with Novel Layouts using Abstract Top-down Maps

Linfeng Zhao, Lawson L. S. Wong

TL;DR

The paper tackles zero-shot navigation in unseen mazes by supplying abstract 2-D maps $m \\in \\mathbb{R}^{N imes N}$ and goal grids $g \\in \\mathbb{R}^{2 imes N imes N}$. It introduces MMN, a model-based framework where a task-conditioned hypermodel $h_\psi$ outputs transition weights $\phi$ for a latent dynamics model $f_\phi$ conditioned on context $c=(m,g)$, enabling planning via MuZero-style MCTS without explicit localization on the map. MMN is trained with an auxiliary model loss and $n$-step hindsight experience replay to cope with sparse rewards, and is evaluated against a model-free baseline (MAH) and a DQN variant, showing superior long-horizon navigation and robustness to map perturbations in DeepMind Lab. The results demonstrate that end-to-end planning with map-conditioned dynamics generalizes to novel layouts and can leverage a hierarchical subgoal strategy to achieve global objectives, highlighting the practical potential of abstract-map-guided navigation without environment-specific training. The work suggests promising directions for subgoal generation and visual-domain extensions in robust, transferable navigation systems.

Abstract

Learning navigation capabilities in different environments has long been one of the major challenges in decision-making. In this work, we focus on zero-shot navigation ability using given abstract $2$-D top-down maps. Like human navigation by reading a paper map, the agent reads the map as an image when navigating in a novel layout, after learning to navigate on a set of training maps. We propose a model-based reinforcement learning approach for this multi-task learning problem, where it jointly learns a hypermodel that takes top-down maps as input and predicts the weights of the transition network. We use the DeepMind Lab environment and customize layouts using generated maps. Our method can adapt better to novel environments in zero-shot and is more robust to noise.

Learning to Navigate in Mazes with Novel Layouts using Abstract Top-down Maps

TL;DR

The paper tackles zero-shot navigation in unseen mazes by supplying abstract 2-D maps and goal grids . It introduces MMN, a model-based framework where a task-conditioned hypermodel outputs transition weights for a latent dynamics model conditioned on context , enabling planning via MuZero-style MCTS without explicit localization on the map. MMN is trained with an auxiliary model loss and -step hindsight experience replay to cope with sparse rewards, and is evaluated against a model-free baseline (MAH) and a DQN variant, showing superior long-horizon navigation and robustness to map perturbations in DeepMind Lab. The results demonstrate that end-to-end planning with map-conditioned dynamics generalizes to novel layouts and can leverage a hierarchical subgoal strategy to achieve global objectives, highlighting the practical potential of abstract-map-guided navigation without environment-specific training. The work suggests promising directions for subgoal generation and visual-domain extensions in robust, transferable navigation systems.

Abstract

Learning navigation capabilities in different environments has long been one of the major challenges in decision-making. In this work, we focus on zero-shot navigation ability using given abstract -D top-down maps. Like human navigation by reading a paper map, the agent reads the map as an image when navigating in a novel layout, after learning to navigate on a set of training maps. We propose a model-based reinforcement learning approach for this multi-task learning problem, where it jointly learns a hypermodel that takes top-down maps as input and predicts the weights of the transition network. We use the DeepMind Lab environment and customize layouts using generated maps. Our method can adapt better to novel environments in zero-shot and is more robust to noise.

Paper Structure

This paper contains 23 sections, 4 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: We develop an agent that can perform zero-shot navigation on unseen maps $\mathcal{T}$ (in DeepMind Lab, blue box), without needing to first explore the new $3$-D environment. Instead, the agent is given the top-down view as additional guidance: an abstract $2$-D occupancy map, and a goal and start position (bottom-left black dot and top-right gray dot). The map provides a rough solution solution, the path cannot be directly followed due to the continuous nature of the agent's environment, as well as unknown map scale, inaccuracies in the map, and noisy localization.
  • Figure 2: Applying the hypermodel $h_\psi$ on map $m_1$ and $m_2$ outputs two sets of transition network weights $\phi_1=h_\psi(m_1,g_1)$ and $\phi_2=h_\psi(m_2,g_2)$. Each transition network uses their weight $\phi_i$ to predict the next state $f(s,a; \phi_i)=s'$, illustrated at the bottom. Since the maps may share local patterns at some scales (illustrated by the cropped $3\times3$ patches in light blue), they can be captured by the hypermodel $h_\psi$.
  • Figure 3: The planning/learning process. Yellow boxes indicate predictions; grey boxes come from actual interactions. (Left) Inference: search with learned model. Applying MCTS with hypermodel to search for policy and value, and act with a sampled action. (Right) Training: building learning targets. Computing targets and backpropagating from loss. The dark blue line indicates $n$-step relabelling. We only illustrate backpropagation for one reward node for simplicity. The solid red line shows the gradient flow from auxiliary model loss to the meta-network's weight $\psi$. The dashed red line is the gradient from task loss.
  • Figure 4: (Left) Zero-shot evaluation performance on $13 \times 13$ maps. Local navigation with different distances between start and goal, from 1 to 15. (Right) Performance of our method on larger maps.
  • Figure 5: Trajectories from hierarchical navigation in zero-shot on $13 \times 13$ maps. The top row is for MMN and bottom row is for MAH. Since there is a fixed scaling factor from maps to environments, we can compute the corresponding location on the abstract map and visualize trajectories, although this information is not known to the agent. The top-right corner is the start, and the bottom-left is the goal. Darker cells indicate provided subgoals from the landmark oracle. For the first 4 tasks (columns), MMN successfully reached the goals, while MAH failed. Both methods failed in the last task (right-most column).
  • ...and 2 more figures