Planning from Imagination: Episodic Simulation and Episodic Memory for Vision-and-Language Navigation
Yiyuan Pan, Yunzhe Xu, Zhe Liu, Hesheng Wang
TL;DR
The paper addresses VLN in unseen environments by introducing SALI, a navigation agent that combines episodic memory with episodic simulation through a reality-imagination hybrid memory. SALI maintains a topological memory map that stores both real observations and imagined content, and uses a recurrent imagination tree to generate high-fidelity future views, all fused via a multimodal transformer to inform actions. Key contributions include the Real-Imaginary Hybrid Memory with dynamic action planning, the Recurrent Imagination Tree for scalable future prediction, and comprehensive pre-training and cross-correction strategies that yield state-of-the-art results on R2R and REVERIE benchmarks. The approach improves navigation robustness and efficiency in complex, unseen environments, demonstrating the practical value of integrating imaginative content with long-term memory for embodied AI.
Abstract
Humans navigate unfamiliar environments using episodic simulation and episodic memory, which facilitate a deeper understanding of the complex relationships between environments and objects. Developing an imaginative memory system inspired by human mechanisms can enhance the navigation performance of embodied agents in unseen environments. However, existing Vision-and-Language Navigation (VLN) agents lack a memory mechanism of this kind. To address this, we propose a novel architecture that equips agents with a reality-imagination hybrid memory system. This system enables agents to maintain and expand their memory through both imaginative mechanisms and navigation actions. Additionally, we design tailored pre-training tasks to develop the agent's imaginative capabilities. Our agent can imagine high-fidelity RGB images for future scenes, achieving state-of-the-art result in Success rate weighted by Path Length (SPL).
