MapVision: CVPR 2024 Autonomous Grand Challenge Mapless Driving Tech Report
Zhongyu Yang, Mai Liu, Jinluo Xie, Yueming Zhang, Chen Shen, Wei Shao, Jichao Jiao, Tengfei Xing, Runbo Hu, Pengfei Xu
TL;DR
MapVision tackles mapless driving by fusing multi-view camera inputs with SD-map priors through a BEV-based framework. It introduces a map encoder pre-training task, an LDTR-based area/topology head, and enhanced traffic-element detection via YOLOX, jointly improving lane, area, and traffic topology reasoning. Ablations show substantial gains from SD-map encoding, larger backbones, and pretraining, with further improvements from LDTR, P2P IoU losses, and auxiliary heads; ensemble strategies yield OLUS = 0.58 on OpenLaneV2. The work demonstrates that SD maps provide valuable long-range geometric priors that complement perception from multi-view images, reducing reliance on HD maps and enabling scalable, mapless autonomous driving in dynamic environments.
Abstract
Autonomous driving without high-definition (HD) maps demands a higher level of active scene understanding. In this competition, the organizers provided the multi-perspective camera images and standard-definition (SD) maps to explore the boundaries of scene reasoning capabilities. We found that most existing algorithms construct Bird's Eye View (BEV) features from these multi-perspective images and use multi-task heads to delineate road centerlines, boundary lines, pedestrian crossings, and other areas. However, these algorithms perform poorly at the far end of roads and struggle when the primary subject in the image is occluded. Therefore, in this competition, we not only used multi-perspective images as input but also incorporated SD maps to address this issue. We employed map encoder pre-training to enhance the network's geometric encoding capabilities and utilized YOLOX to improve traffic element detection precision. Additionally, for area detection, we innovatively introduced LDTR and auxiliary tasks to achieve higher precision. As a result, our final OLUS score is 0.58.
