Table of Contents
Fetching ...

Online Temporal Fusion for Vectorized Map Construction in Mapless Autonomous Driving

Jiagang Chen, Liangliang Pan, Shunping Ji, Ji Zhao, Zichao Zhang

TL;DR

This work tackles the challenge of mapless autonomous driving by building temporally consistent vectorized maps online from onboard detections. It introduces a semantic voxel hashing framework that incrementally fuses road-marking detections into a sparse 3D voxel map, extracts reliable voxels, and clusters them into polyline road markings, which are then transformed into lane boundaries and linkages using domain knowledge. The approach yields lane-level, vectorized road layouts suitable for planning and control, and shows stronger stability and geometric accuracy than single-frame methods across urban scenarios, validated on in-house and Argoverse2 datasets with real-time performance on embedded hardware. The results suggest a practical path toward reducing HD-map dependence in mapless autonomous driving while enabling robust PnC integration; future work includes reducing reliance on SD maps and incorporating uncertainty into fusion.

Abstract

To reduce the reliance on high-definition (HD) maps, a growing trend in autonomous driving is leveraging onboard sensors to generate vectorized maps online. However, current methods are mostly constrained by processing only single-frame inputs, which hampers their robustness and effectiveness in complex scenarios. To overcome this problem, we propose an online map construction system that exploits the long-term temporal information to build a consistent vectorized map. First, the system efficiently fuses all historical road marking detections from an off-the-shelf network into a semantic voxel map, which is implemented using a hashing-based strategy to exploit the sparsity of road elements. Then reliable voxels are found by examining the fused information and incrementally clustered into an instance-level representation of road markings. Finally, the system incorporates domain knowledge to estimate the geometric and topological structures of roads, which can be directly consumed by the planning and control (PnC) module. Through experiments conducted in complicated urban environments, we have demonstrated that the output of our system is more consistent and accurate than the network output by a large margin and can be effectively used in a closed-loop autonomous driving system.

Online Temporal Fusion for Vectorized Map Construction in Mapless Autonomous Driving

TL;DR

This work tackles the challenge of mapless autonomous driving by building temporally consistent vectorized maps online from onboard detections. It introduces a semantic voxel hashing framework that incrementally fuses road-marking detections into a sparse 3D voxel map, extracts reliable voxels, and clusters them into polyline road markings, which are then transformed into lane boundaries and linkages using domain knowledge. The approach yields lane-level, vectorized road layouts suitable for planning and control, and shows stronger stability and geometric accuracy than single-frame methods across urban scenarios, validated on in-house and Argoverse2 datasets with real-time performance on embedded hardware. The results suggest a practical path toward reducing HD-map dependence in mapless autonomous driving while enabling robust PnC integration; future work includes reducing reliance on SD maps and incorporating uncertainty into fusion.

Abstract

To reduce the reliance on high-definition (HD) maps, a growing trend in autonomous driving is leveraging onboard sensors to generate vectorized maps online. However, current methods are mostly constrained by processing only single-frame inputs, which hampers their robustness and effectiveness in complex scenarios. To overcome this problem, we propose an online map construction system that exploits the long-term temporal information to build a consistent vectorized map. First, the system efficiently fuses all historical road marking detections from an off-the-shelf network into a semantic voxel map, which is implemented using a hashing-based strategy to exploit the sparsity of road elements. Then reliable voxels are found by examining the fused information and incrementally clustered into an instance-level representation of road markings. Finally, the system incorporates domain knowledge to estimate the geometric and topological structures of roads, which can be directly consumed by the planning and control (PnC) module. Through experiments conducted in complicated urban environments, we have demonstrated that the output of our system is more consistent and accurate than the network output by a large margin and can be effectively used in a closed-loop autonomous driving system.
Paper Structure (28 sections, 5 equations, 9 figures, 2 tables)

This paper contains 28 sections, 5 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Results of the proposed system in an urban road test. (a)-(c) the road markings (stoplines, lanelines, and roadedges) and layout (lanes and lane linkages) generated by our method online in typical scenarios.
  • Figure 2: Illustration of the proposed system. Utilizing surround-view cameras and odometry poses as input, the system outputs vectorized road markings (lanelines, roadedges and stoplines) and road layout (lanes and lane linkages) for PnC modules.
  • Figure 3: The voxel hashing data structure. The semantic voxel map is maintained in a hash table format and composed of independent blocks indexed by their spatial coordinates. Each block comprises an $8^3$ voxel grid and stores the voxels as a array. A voxel can thus be uniquely identified by the key of the block it belongs to and its index in the block.
  • Figure 4: The voxels are divided into groups according to the principal components (blue), and polylines (green) are estimated based on the grouping. Top-left: The voxels are distributed mainly along $\mathrm{PC}_1$ and can be effectively grouped based on the projections onto $\mathrm{PC}_1$. Bottom-left: When $\mathrm{PC}_2$ is significant, the voxels are divided into the quadrants formed by $\mathrm{PC}_{1/2}$, and each quadrant is processed similarly as in the top-left case. Right: In a scene of the Argoverse2 dataset Argoverse2, lanelines (red) corresponds to the left-top case, while roadedges (green) corresponds to the left-bottom case.
  • Figure 5:
  • ...and 4 more figures