Table of Contents
Fetching ...

IC-Mapper: Instance-Centric Spatio-Temporal Modeling for Online Vectorized Map Construction

Jiangtong Zhu, Zhao Yang, Yinan Shi, Jianwu Fang, Jianru Xue

TL;DR

IC-Mapper tackles online vector map construction by integrating instance-centric temporal association and spatial fusion into an end-to-end framework that performs detection, tracking, and global map updating. The method maintains a memory of map instances, aligns detections across frames using both geometric and feature cues, and fuses current detections with a history of the global map through cross-attention and curve-fitting updates. Empirical results on nuScenes show state-of-the-art performance across detection, tracking, and mapping metrics, with ablations confirming the importance of both temporal and spatial modules. The approach enables real-time, globally consistent vector map construction, with practical implications for scalable and adaptive HD mapping in autonomous driving.

Abstract

Online vector map construction based on visual data can bypass the processes of data collection, post-processing, and manual annotation required by traditional map construction, which significantly enhances map-building efficiency. However, existing work treats the online mapping task as a local range perception task, overlooking the spatial scalability required for map construction. We propose IC-Mapper, an instance-centric online mapping framework, which comprises two primary components: 1) Instance-centric temporal association module: For the detection queries of adjacent frames, we measure them in both feature and geometric dimensions to obtain the matching correspondence between instances across frames. 2) Instance-centric spatial fusion module: We perform point sampling on the historical global map from a spatial dimension and integrate it with the detection results of instances corresponding to the current frame to achieve real-time expansion and update of the map. Based on the nuScenes dataset, we evaluate our approach on detection, tracking, and global mapping metrics. Experimental results demonstrate the superiority of IC-Mapper against other state-of-the-art methods. Code will be released on https://github.com/Brickzhuantou/IC-Mapper.

IC-Mapper: Instance-Centric Spatio-Temporal Modeling for Online Vectorized Map Construction

TL;DR

IC-Mapper tackles online vector map construction by integrating instance-centric temporal association and spatial fusion into an end-to-end framework that performs detection, tracking, and global map updating. The method maintains a memory of map instances, aligns detections across frames using both geometric and feature cues, and fuses current detections with a history of the global map through cross-attention and curve-fitting updates. Empirical results on nuScenes show state-of-the-art performance across detection, tracking, and mapping metrics, with ablations confirming the importance of both temporal and spatial modules. The approach enables real-time, globally consistent vector map construction, with practical implications for scalable and adaptive HD mapping in autonomous driving.

Abstract

Online vector map construction based on visual data can bypass the processes of data collection, post-processing, and manual annotation required by traditional map construction, which significantly enhances map-building efficiency. However, existing work treats the online mapping task as a local range perception task, overlooking the spatial scalability required for map construction. We propose IC-Mapper, an instance-centric online mapping framework, which comprises two primary components: 1) Instance-centric temporal association module: For the detection queries of adjacent frames, we measure them in both feature and geometric dimensions to obtain the matching correspondence between instances across frames. 2) Instance-centric spatial fusion module: We perform point sampling on the historical global map from a spatial dimension and integrate it with the detection results of instances corresponding to the current frame to achieve real-time expansion and update of the map. Based on the nuScenes dataset, we evaluate our approach on detection, tracking, and global mapping metrics. Experimental results demonstrate the superiority of IC-Mapper against other state-of-the-art methods. Code will be released on https://github.com/Brickzhuantou/IC-Mapper.

Paper Structure

This paper contains 30 sections, 9 equations, 7 figures, 5 tables, 2 algorithms.

Figures (7)

  • Figure 1: Traditional deep learning-based online vector map construction approaches focus only on local detection performance. By incorporating temporal tracking and spatial fusion modules, we have implemented an end-to-end detection-tracking-fusion process that enables the construction of global maps.
  • Figure 2: The overall framework of IC-Mapper. The input consists of continuous multi-frame surround-view images. Building upon an existing query-based visual detector, we introduce an instance-centric temporal association module and a spatial fusion module, which enables end-to-end detection, tracking, and fusion of map instances and in turn facilitates the online reconstruction of global vectorized maps.
  • Figure 3: Temporal Association Module: We extract tracking instances from the tracking memory buffer and compute the association matrices between the tracking instances and detection instances from both geometric and feature dimensions, assigning ID information to each detection instance.
  • Figure 4: An illustration of the instance-centric spatial fusion and updating process. Before fusion, we first sample the history point sets around the intersection area between the current patch and the maintained global map. Then a cross-attention-based spatial fusion is applied between the detected points and the sampled historical points. The fused queries are further decoded to serve as the final result, which is then updated into the global map using a curve-fitting-based merging algorithm.
  • Figure 5: An illustration of the curve fitting algorithm. As can be seen, polyline-type instances are merged based on curve fitting while polygon-type instances are merged simply by union. During curve fitting, points are first reordered and concatenated. Then a fitting and resampling is performed to generate the updated point sets. We denote the points from the global map as red, points from detected instances as blue, and the final points as green.
  • ...and 2 more figures