Table of Contents
Fetching ...

TopoMaskV2: Enhanced Instance-Mask-Based Formulation for the Road Topology Problem

M. Esat Kalfaoglu, Halil Ibrahim Ozturk, Ozsel Kilinc, Alptekin Temizel

TL;DR

TopoMask addresses road topology by predicting centerlines as flow-aware instance masks in the BEV domain. It introduces quad-direction labels to encode centerline flow, employs a masked-attention transformer decoder, and uses a three-stage quad-direction post-processing to convert masks into ordered 3D centerline point sets, while a Bezier head provides complementary geometric cues. A fusion mechanism combines mask-derived and Bezier-derived outputs to boost both centerline accuracy and topology relations, and a multi-height bin Lift-Splat BEV variant preserves height information to further improve performance. On OpenLane-V2, TopoMask achieves state-of-the-art results across Subset-A and Subset-B, and across multiple metrics, while also providing a detailed analysis of attention types and metric considerations. The work highlights potential extensions in height prediction and topology evaluation, underscoring the practical impact for robust road topology understanding in autonomous driving.

Abstract

Recently, the centerline has become a popular representation of lanes due to its advantages in solving the road topology problem. To enhance centerline prediction, we have developed a new approach called TopoMask. Unlike previous methods that rely on keypoints or parametric methods, TopoMask utilizes an instance-mask-based formulation coupled with a masked-attention-based transformer architecture. We introduce a quad-direction label representation to enrich the mask instances with flow information and design a corresponding post-processing technique for mask-to-centerline conversion. Additionally, we demonstrate that the instance-mask formulation provides complementary information to parametric Bezier regressions, and fusing both outputs leads to improved detection and topology performance. Moreover, we analyze the shortcomings of the pillar assumption in the Lift Splat technique and adapt a multi-height bin configuration. Experimental results show that TopoMask achieves state-of-the-art performance in the OpenLane-V2 dataset, increasing from 44.1 to 49.4 for Subset-A and 44.7 to 51.8 for Subset-B in the V1.1 OLS baseline.

TopoMaskV2: Enhanced Instance-Mask-Based Formulation for the Road Topology Problem

TL;DR

TopoMask addresses road topology by predicting centerlines as flow-aware instance masks in the BEV domain. It introduces quad-direction labels to encode centerline flow, employs a masked-attention transformer decoder, and uses a three-stage quad-direction post-processing to convert masks into ordered 3D centerline point sets, while a Bezier head provides complementary geometric cues. A fusion mechanism combines mask-derived and Bezier-derived outputs to boost both centerline accuracy and topology relations, and a multi-height bin Lift-Splat BEV variant preserves height information to further improve performance. On OpenLane-V2, TopoMask achieves state-of-the-art results across Subset-A and Subset-B, and across multiple metrics, while also providing a detailed analysis of attention types and metric considerations. The work highlights potential extensions in height prediction and topology evaluation, underscoring the practical impact for robust road topology understanding in autonomous driving.

Abstract

Recently, the centerline has become a popular representation of lanes due to its advantages in solving the road topology problem. To enhance centerline prediction, we have developed a new approach called TopoMask. Unlike previous methods that rely on keypoints or parametric methods, TopoMask utilizes an instance-mask-based formulation coupled with a masked-attention-based transformer architecture. We introduce a quad-direction label representation to enrich the mask instances with flow information and design a corresponding post-processing technique for mask-to-centerline conversion. Additionally, we demonstrate that the instance-mask formulation provides complementary information to parametric Bezier regressions, and fusing both outputs leads to improved detection and topology performance. Moreover, we analyze the shortcomings of the pillar assumption in the Lift Splat technique and adapt a multi-height bin configuration. Experimental results show that TopoMask achieves state-of-the-art performance in the OpenLane-V2 dataset, increasing from 44.1 to 49.4 for Subset-A and 44.7 to 51.8 for Subset-B in the V1.1 OLS baseline.
Paper Structure (23 sections, 3 equations, 5 figures, 6 tables)

This paper contains 23 sections, 3 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Overview of the Centerline Prediction in the TopoMask Architecture. The process starts with Bird’s Eye View (BEV) Feature Extraction, projecting features from multi-view images onto a unified coordinate system. Then, masked-attention-based transformer decoder creates flow-aware instance masks. These masks are converted into centerlines through a quad-direction label-aware post-processing step. To further improve centerline prediction, the Bezier Head is incorporated into the transformer decoder, and a fusion block is used to combine both representations.
  • Figure 2: Transformer Decoder in TopoMask The TopoMask architecture comprises a mask head and a Bezier head, updating queries through successive layers of masked attention and self-attention within each decoder layer. Bezier control points are iteratively refined in each decoder layer, with the final binary mask output generated via a dot product between mask embeddings and BEV features.
  • Figure 3: Quad-Direction Labels Encoding This figure demonstrates the creation of quad-direction labels, which encode the flow information of centerlines. These labels are up, down, left, and right. To generate these labels, a voting mechanism is applied between consecutive centerline points. In cases of equal votes, the final decision is based on the angle between the start and end points of the centerline.
  • Figure 4: Quad-Direction Label-Aware Post-Processing The figure provides an overview of the technique, which converts flow-aware instance masks into centerlines. During the first and second stages of this technique, the mask representation is transformed into sparse BEV points. In the third stage, these points are ordered using quad-direction labels, thereby encoding the flow information of the centerlines.
  • Figure 5: Voxel Feature Aggregation Comparison. The figure demonstrates the voxel aggregation techniques under the regular pillar assumption and the multi-height bin assumption. The proposed multi-height bin assumption encodes more information by preserving the practicality of the regular pillar assumption.