Rapid Task-Solving in Novel Environments
Sam Ritter, Ryan Faulkner, Laurent Sartran, Adam Santoro, Matt Botvinick, David Raposo
TL;DR
Rapid Task-Solving in Novel Environments (RTS) tackles how agents can instantly operate in unfamiliar settings by learning to explore, remember, and plan within a single episode. The authors introduce Episodic Planning Networks (EPNs), which use iterative self-attention over episodic memories to produce a value-iteration–like planning process that can adapt to new environments. They validate RTS on two domains—Memory&Planning Game and One-Shot StreetLearn—showing that EPNs surpass prior meta-RL baselines, generalize to larger maps and unseen cities, and improve with additional planning iterations. The work demonstrates a scalable path toward deploying AI that can rapidly reason and act in novel environments, with implications for real-world robotics and autonomous navigation.
Abstract
We propose the challenge of rapid task-solving in novel environments (RTS), wherein an agent must solve a series of tasks as rapidly as possible in an unfamiliar environment. An effective RTS agent must balance between exploring the unfamiliar environment and solving its current task, all while building a model of the new environment over which it can plan when faced with later tasks. While modern deep RL agents exhibit some of these abilities in isolation, none are suitable for the full RTS challenge. To enable progress toward RTS, we introduce two challenge domains: (1) a minimal RTS challenge called the Memory&Planning Game and (2) One-Shot StreetLearn Navigation, which introduces scale and complexity from real-world data. We demonstrate that state-of-the-art deep RL agents fail at RTS in both domains, and that this failure is due to an inability to plan over gathered knowledge. We develop Episodic Planning Networks (EPNs) and show that deep-RL agents with EPNs excel at RTS, outperforming the nearest baseline by factors of 2-3 and learning to navigate held-out StreetLearn maps within a single episode. We show that EPNs learn to execute a value iteration-like planning algorithm and that they generalize to situations beyond their training experience. algorithm and that they generalize to situations beyond their training experience.
