CSMapping: Scalable Crowdsourced Semantic Mapping and Topology Inference for Autonomous Driving
Zhijian Qiao, Zehuan Yu, Tong Li, Chih-Chung Chou, Wenchao Ding, Shaojie Shen
TL;DR
CSMapping tackles the challenge of scalable, high-quality semantic and topological maps for autonomous driving from crowdsourced data. It blends a learned HD-map latent diffusion prior with a vectorized initialization and latent-space MAP optimization, including a Gaussian-basis reparameterization and multi-start posterior scoring to robustly complete incomplete observations. For topology, it introduces confidence-weighted k-medoids clustering with kinematic refinement to produce drivable centerlines that improve as data grows. Extensive experiments on nuScenes, Argoverse 2, and proprietary datasets demonstrate state-of-the-art performance, with strong ablations and scalability analyses across training and inference, as well as practical benefits for online perception. The work enables scalable map construction that progressively improves with data and offers a practical framework for online detection enhancement and cross-submap consistency via factor-graph optimization.
Abstract
Crowdsourcing enables scalable autonomous driving map construction, but low-cost sensor noise hinders quality from improving with data volume. We propose CSMapping, a system that produces accurate semantic maps and topological road centerlines whose quality consistently increases with more crowdsourced data. For semantic mapping, we train a latent diffusion model on HD maps (optionally conditioned on SD maps) to learn a generative prior of real-world map structure, without requiring paired crowdsourced/HD-map supervision. This prior is incorporated via constrained MAP optimization in latent space, ensuring robustness to severe noise and plausible completion in unobserved areas. Initialization uses a robust vectorized mapping module followed by diffusion inversion; optimization employs efficient Gaussian-basis reparameterization, projected gradient descent zobracket multi-start, and latent-space factor-graph for global consistency. For topological mapping, we apply confidence-weighted k-medoids clustering and kinematic refinement to trajectories, yielding smooth, human-like centerlines robust to trajectory variation. Experiments on nuScenes, Argoverse 2, and a large proprietary dataset achieve state-of-the-art semantic and topological mapping performance, with thorough ablation and scalability studies.
