CAMAv2: A Vision-Centric Approach for Static Map Element Annotation
Shiyuan Chen, Jiaxin Zhang, Ruohong Mei, Yingfeng Cai, Haoran Yin, Tao Chen, Wei Sui, Cong Yang
TL;DR
CAMAv2 introduces a vision-centric pipeline that generates accurate, reprojection-consistent 3D HD map annotations from surround-view imagery without LiDAR. It fuses WIGO-based pose estimation, an odometry-guided SfM with multiple efficiency/robustness enhancements, and RoMe road-surface meshes, followed by a semi-automatic VMA for 3D map annotation with elevation. On nuScenes, CAMAv2 reduces semantic reprojection error from 8.03 to 4.96 pixels and improves MapTRv2's reprojection performance when trained with CAMAv2 data, while a multi-scene aggregation and parallel reconstruction approach delivers fivefold efficiency gains and better robustness. The approach generalizes to other datasets such as Waymo Open Dataset, supports long-tail and adverse-weather scenarios, and provides publicly available code and nuScenes-CAMAv2 annotations to accelerate 4D labeling for autonomous driving research.
Abstract
The recent development of online static map element (a.k.a. HD map) construction algorithms has raised a vast demand for data with ground truth annotations. However, available public datasets currently cannot provide high-quality training data regarding consistency and accuracy. For instance, the manual labelled (low efficiency) nuScenes still contains misalignment and inconsistency between the HD maps and images (e.g., around 8.03 pixels reprojection error on average). To this end, we present CAMAv2: a vision-centric approach for Consistent and Accurate Map Annotation. Without LiDAR inputs, our proposed framework can still generate high-quality 3D annotations of static map elements. Specifically, the annotation can achieve high reprojection accuracy across all surrounding cameras and is spatial-temporal consistent across the whole sequence. We apply our proposed framework to the popular nuScenes dataset to provide efficient and highly accurate annotations. Compared with the original nuScenes static map element, our CAMAv2 annotations achieve lower reprojection errors (e.g., 4.96 vs. 8.03 pixels). Models trained with annotations from CAMAv2 also achieve lower reprojection errors (e.g., 5.62 vs. 8.43 pixels).
