Spatial Retrieval Augmented Autonomous Driving

Xiaosong Jia; Chenhe Zhang; Yule Jiang; Songbur Wong; Zhiyuan Zhang; Chen Chen; Shaofeng Zhang; Xuanhe Zhou; Xue Yang; Junchi Yan; Yu-Gang Jiang

Spatial Retrieval Augmented Autonomous Driving

Xiaosong Jia, Chenhe Zhang, Yule Jiang, Songbur Wong, Zhiyuan Zhang, Chen Chen, Shaofeng Zhang, Xuanhe Zhou, Xue Yang, Junchi Yan, Yu-Gang Jiang

TL;DR

The paper introduces spatial retrieval to augment autonomous driving with offline geographic imagery, addressing perception horizon limits and occlusion. It presents nuScenes-Geography, a dataset extension using Google Maps data, and a plug-and-play Spatial Retrieval Adapter with a Reliability Estimation gate to fuse geography into BEV-based tasks. Across object detection, online mapping, occupancy, planning, and generative world modeling, geographic priors improve performance and temporal consistency, especially in challenging conditions, while remaining robust to incomplete retrieval. The work provides open-source data, pipelines, and baselines to promote retrieval-augmented autonomous driving research.

Abstract

Existing autonomous driving systems rely on onboard sensors (cameras, LiDAR, IMU, etc) for environmental perception. However, this paradigm is limited by the drive-time perception horizon and often fails under limited view scope, occlusion or extreme conditions such as darkness and rain. In contrast, human drivers are able to recall road structure even under poor visibility. To endow models with this ``recall" ability, we propose the spatial retrieval paradigm, introducing offline retrieved geographic images as an additional input. These images are easy to obtain from offline caches (e.g, Google Maps or stored autonomous driving datasets) without requiring additional sensors, making it a plug-and-play extension for existing AD tasks. For experiments, we first extend the nuScenes dataset with geographic images retrieved via Google Maps APIs and align the new data with ego-vehicle trajectories. We establish baselines across five core autonomous driving tasks: object detection, online mapping, occupancy prediction, end-to-end planning, and generative world modeling. Extensive experiments show that the extended modality could enhance the performance of certain tasks. We will open-source dataset curation code, data, and benchmarks for further study of this new autonomous driving paradigm.

Spatial Retrieval Augmented Autonomous Driving

TL;DR

Abstract

Spatial Retrieval Augmented Autonomous Driving

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (22)