The StreetLearn Environment and Dataset
Piotr Mirowski, Andras Banki-Horvath, Keith Anderson, Denis Teplyashin, Karl Moritz Hermann, Mateusz Malinowski, Matthew Koichi Grimes, Karen Simonyan, Koray Kavukcuoglu, Andrew Zisserman, Raia Hadsell
TL;DR
StreetLearn presents a real-world-inspired interactive navigation environment built on Google Street View, enabling end-to-end, goal-driven visual navigation in city-scale graphs. It defines a courier-style task with absolute goal coordinates, introduces a curriculum for gradually harder goals, and provides an open-source codebase and dataset. Baseline agents (CityNav and MultiCityNav) trained with IMPALA demonstrate strong performance in New York City and face more challenge in Pittsburgh, with generalization to held-out regions and cross-city transfer illustrating both potential and limitations. The work offers a valuable benchmark for grounded, long-range navigation in diverse, photorealistic urban settings and advances understanding of how perception, planning and memory interact under real-world connectivity constraints.
Abstract
Navigation is a rich and well-grounded problem domain that drives progress in many different areas of research: perception, planning, memory, exploration, and optimisation in particular. Historically these challenges have been separately considered and solutions built that rely on stationary datasets - for example, recorded trajectories through an environment. These datasets cannot be used for decision-making and reinforcement learning, however, and in general the perspective of navigation as an interactive learning task, where the actions and behaviours of a learning agent are learned simultaneously with the perception and planning, is relatively unsupported. Thus, existing navigation benchmarks generally rely on static datasets (Geiger et al., 2013; Kendall et al., 2015) or simulators (Beattie et al., 2016; Shah et al., 2018). To support and validate research in end-to-end navigation, we present StreetLearn: an interactive, first-person, partially-observed visual environment that uses Google Street View for its photographic content and broad coverage, and give performance baselines for a challenging goal-driven navigation task. The environment code, baseline agent code, and the dataset are available at http://streetlearn.cc
