Table of Contents
Fetching ...

PreSight: Enhancing Autonomous Vehicle Perception with City-Scale NeRF Priors

Tianyuan Yuan, Yucheng Mao, Jiawei Yang, Yicheng Liu, Yue Wang, Hang Zhao

TL;DR

PreSight addresses perception under occlusion and in unseen urban environments by constructing static city-scale priors from past traversals using NeRF. It partitions cities into ~$1~km^2$ tiles and optimizes multiple sub-fields per tile, distilling semantic information via vision foundation models, then extracts structured priors with a ray marching process and integrates them into BEV/3D features for online perception models. The approach yields consistent improvements on nuScenes for HD-map construction and 3D occupancy prediction across diverse models with low additional overhead and reduced memory relative to LiDAR-based priors. These results demonstrate a practical, scalable method to enhance autonomous driving perception by leveraging unsupervised, city-scale priors derived from crowdsourced traversals.

Abstract

Autonomous vehicles rely extensively on perception systems to navigate and interpret their surroundings. Despite significant advancements in these systems recently, challenges persist under conditions like occlusion, extreme lighting, or in unfamiliar urban areas. Unlike these systems, humans do not solely depend on immediate observations to perceive the environment. In navigating new cities, humans gradually develop a preliminary mental map to supplement real-time perception during subsequent visits. Inspired by this human approach, we introduce a novel framework, PreSight, that leverages past traversals to construct static prior memories, enhancing online perception in later navigations. Our method involves optimizing a city-scale neural radiance field with data from previous journeys to generate neural priors. These priors, rich in semantic and geometric details, are derived without manual annotations and can seamlessly augment various state-of-the-art perception models, improving their efficacy with minimal additional computational cost. Experimental results on the nuScenes dataset demonstrate the framework's high compatibility with diverse online perception models. Specifically, it shows remarkable improvements in HD-map construction and occupancy prediction tasks, highlighting its potential as a new perception framework for autonomous driving systems. Our code will be released at https://github.com/yuantianyuan01/PreSight.

PreSight: Enhancing Autonomous Vehicle Perception with City-Scale NeRF Priors

TL;DR

PreSight addresses perception under occlusion and in unseen urban environments by constructing static city-scale priors from past traversals using NeRF. It partitions cities into ~ tiles and optimizes multiple sub-fields per tile, distilling semantic information via vision foundation models, then extracts structured priors with a ray marching process and integrates them into BEV/3D features for online perception models. The approach yields consistent improvements on nuScenes for HD-map construction and 3D occupancy prediction across diverse models with low additional overhead and reduced memory relative to LiDAR-based priors. These results demonstrate a practical, scalable method to enhance autonomous driving perception by leveraging unsupervised, city-scale priors derived from crowdsourced traversals.

Abstract

Autonomous vehicles rely extensively on perception systems to navigate and interpret their surroundings. Despite significant advancements in these systems recently, challenges persist under conditions like occlusion, extreme lighting, or in unfamiliar urban areas. Unlike these systems, humans do not solely depend on immediate observations to perceive the environment. In navigating new cities, humans gradually develop a preliminary mental map to supplement real-time perception during subsequent visits. Inspired by this human approach, we introduce a novel framework, PreSight, that leverages past traversals to construct static prior memories, enhancing online perception in later navigations. Our method involves optimizing a city-scale neural radiance field with data from previous journeys to generate neural priors. These priors, rich in semantic and geometric details, are derived without manual annotations and can seamlessly augment various state-of-the-art perception models, improving their efficacy with minimal additional computational cost. Experimental results on the nuScenes dataset demonstrate the framework's high compatibility with diverse online perception models. Specifically, it shows remarkable improvements in HD-map construction and occupancy prediction tasks, highlighting its potential as a new perception framework for autonomous driving systems. Our code will be released at https://github.com/yuantianyuan01/PreSight.
Paper Structure (24 sections, 14 equations, 6 figures, 7 tables)

This paper contains 24 sections, 14 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Pipeline of PreSight. Leveraging historical traversal data, PreSight enhances online perception through NeRF, compressing extensive observations into implicit fields. A ray-marching algorithm extracts structured city-level priors, facilitating improved perception in subsequent visits.
  • Figure 2: A detailed pipeline of PreSight. It starts by optimizing city-scale NeRFs from past traversals' observation. Then structured priors are extracted from NeRFs to enhance online perception models.
  • Figure 3: Overview of PreSight's scene representation.
  • Figure 4: The integration module. Point-based priors are voxelized and then encoded into BEV or 3D features using convolutional layers. These features are fused with online features from perception models, enhancing the overall features for decoding.
  • Figure 5: Visualization of StreamMapNet prediction with and without priors. The model with priors accurately predicts road structure, while baseline model without priors fails.
  • ...and 1 more figures