Table of Contents
Fetching ...

TopoBDA: Towards Bezier Deformable Attention for Road Topology Understanding

Muhammet Esat Kalfaoglu, Halil Ibrahim Ozturk, Ozsel Kilinc, Alptekin Temizel

TL;DR

TopoBDA tackles road topology understanding by introducing Bezier Deformable Attention (BDA) within a BEV transformer decoder, enabling direct, efficient attention around Bezier control points for centerline prediction. It integrates an MPDA adaptation, an indirect instance-mask auxiliary loss with a Mask-L1 mix matcher, and a multi-modal fusion pipeline that includes LiDAR and SDMap data to boost topology reasoning. Comprehensive experiments on OpenLane-V1 and OpenLane-V2 demonstrate state-of-the-art centerline detection and topology metrics, with notable gains from sensor fusion and multi-modal inputs. The work contributes a unified framework for 3D lane topology and HDMap element prediction, offering practical implications for autonomous driving systems and future edge-deployable solutions with consideration of computational trade-offs.

Abstract

Understanding road topology is crucial for autonomous driving. This paper introduces TopoBDA (Topology with Bezier Deformable Attention), a novel approach that enhances road topology comprehension by leveraging Bezier Deformable Attention (BDA). TopoBDA processes multi-camera 360-degree imagery to generate Bird's Eye View (BEV) features, which are refined through a transformer decoder employing BDA. BDA utilizes Bezier control points to drive the deformable attention mechanism, improving the detection and representation of elongated and thin polyline structures, such as lane centerlines. Additionally, TopoBDA integrates two auxiliary components: an instance mask formulation loss and a one-to-many set prediction loss strategy, to further refine centerline detection and enhance road topology understanding. Experimental evaluations on the OpenLane-V2 dataset demonstrate that TopoBDA outperforms existing methods, achieving state-of-the-art results in centerline detection and topology reasoning. TopoBDA also achieves the best results on the OpenLane-V1 dataset in 3D lane detection. Further experiments on integrating multi-modal data -- such as LiDAR, radar, and SDMap -- show that multimodal inputs can further enhance performance in road topology understanding.

TopoBDA: Towards Bezier Deformable Attention for Road Topology Understanding

TL;DR

TopoBDA tackles road topology understanding by introducing Bezier Deformable Attention (BDA) within a BEV transformer decoder, enabling direct, efficient attention around Bezier control points for centerline prediction. It integrates an MPDA adaptation, an indirect instance-mask auxiliary loss with a Mask-L1 mix matcher, and a multi-modal fusion pipeline that includes LiDAR and SDMap data to boost topology reasoning. Comprehensive experiments on OpenLane-V1 and OpenLane-V2 demonstrate state-of-the-art centerline detection and topology metrics, with notable gains from sensor fusion and multi-modal inputs. The work contributes a unified framework for 3D lane topology and HDMap element prediction, offering practical implications for autonomous driving systems and future edge-deployable solutions with consideration of computational trade-offs.

Abstract

Understanding road topology is crucial for autonomous driving. This paper introduces TopoBDA (Topology with Bezier Deformable Attention), a novel approach that enhances road topology comprehension by leveraging Bezier Deformable Attention (BDA). TopoBDA processes multi-camera 360-degree imagery to generate Bird's Eye View (BEV) features, which are refined through a transformer decoder employing BDA. BDA utilizes Bezier control points to drive the deformable attention mechanism, improving the detection and representation of elongated and thin polyline structures, such as lane centerlines. Additionally, TopoBDA integrates two auxiliary components: an instance mask formulation loss and a one-to-many set prediction loss strategy, to further refine centerline detection and enhance road topology understanding. Experimental evaluations on the OpenLane-V2 dataset demonstrate that TopoBDA outperforms existing methods, achieving state-of-the-art results in centerline detection and topology reasoning. TopoBDA also achieves the best results on the OpenLane-V1 dataset in 3D lane detection. Further experiments on integrating multi-modal data -- such as LiDAR, radar, and SDMap -- show that multimodal inputs can further enhance performance in road topology understanding.

Paper Structure

This paper contains 61 sections, 18 equations, 10 figures, 15 tables, 1 algorithm.

Figures (10)

  • Figure 1: Comparison of various cross-attention mechanisms within the decoder architecture for polyline structures.
  • Figure 2: Overview of the TopoBDA architecture. The TopoBDA architecture is based on the instance query concept. The extracted BEV features from the multiple camera images are fed into the transformer decoder. The decoder outputs Bezier control points for each query, which are then converted into centerline instances via matrix multiplication. Additionally, each centerline query predicts instance masks, but only during training.
  • Figure 3: Comparison of Single-Point Deformable Attention (SPDA), Multi-Point Deformable Attention (MPDA), and Bezier Deformable Attention (BDA). Points denote the reference positions (anchors) for each attention head, while arrows indicate the learned offsets that shift attention from these anchors to the actual sampling locations where features are aggregated. SPDA uses identical reference positions across all heads, whereas MPDA and BDA employ distinct reference positions per head, improving attention efficiency for polyline structures. Although MPDA and BDA share the same underlying mechanism, they differ in how multiple reference points $(p_x, p_y)$ are selected. BDA directly utilizes Bezier points as reference positions, while MPDA requires conversion of Bezier points into polyline points and utilizes polyline points as reference positions.
  • Figure 4: Comparison of Multi-Point Deformable Attention (MPDA) and Bezier Deformable Attention (BDA): MPDA necessitates an additional matrix multiplication block within each transformer decoder to convert predicted Bezier control points into polyline points for use as reference points. Despite their different input utilizations as reference points, the mechanisms of MPDA and BDA blocks are fundamentally the same: each attention head operates on a distinct reference point—polyline points in MPDA and Bezier control points in BDA.
  • Figure 5: This figure visualizes the layers of TopoBDA, each driven by Bezier Deformable Attention (BDA) using control points predicted through iterative refinement. Note that iterative refinement is not applicable to the first layer, which uses direct prediction.
  • ...and 5 more figures