Memory Proxy Maps for Visual Navigation
Faith Johnson, Bryan Bo Cao, Ashwin Ashok, Shubham Jain, Kristin Dana
TL;DR
The paper addresses visual navigation in unseen environments without odometry, graphs, or reinforcement learning by proposing a three-tier feudal architecture that relies on a self-supervised Memory Proxy Map (MPM) as a memory proxy. It introduces a high-level memory manager (MPM), a mid-level waypoint generator (WayNet) trained via human point-click demonstrations, and a low-level action module that maps depth and WayNet waypoints to discrete actions. The approach achieves state-of-the-art performance on image-goal navigation in Gibson Habitat environments with significantly reduced data and without simulators, odometry, or graph-based planning. The work highlights the viability of memory-based, hierarchical navigation for unseen environments and points toward efficient, continual-learning-ready deployment in real-world scenarios.
Abstract
Visual navigation takes inspiration from humans, who navigate in previously unseen environments using vision without detailed environment maps. Inspired by this, we introduce a novel no-RL, no-graph, no-odometry approach to visual navigation using feudal learning to build a three tiered agent. Key to our approach is a memory proxy map (MPM), an intermediate representation of the environment learned in a self-supervised manner by the high-level manager agent that serves as a simplified memory, approximating what the agent has seen. We demonstrate that recording observations in this learned latent space is an effective and efficient memory proxy that can remove the need for graphs and odometry in visual navigation tasks. For the mid-level manager agent, we develop a waypoint network (WayNet) that outputs intermediate subgoals, or waypoints, imitating human waypoint selection during local navigation. For the low-level worker agent, we learn a classifier over a discrete action space that avoids local obstacles and moves the agent towards the WayNet waypoint. The resulting feudal navigation network offers a novel approach with no RL, no graph, no odometry, and no metric map; all while achieving SOTA results on the image goal navigation task.
