Table of Contents
Fetching ...

AurigaNet: A Real-Time Multi-Task Network for Enhanced Urban Driving Perception

Kiarash Ghasemzadeh, Sedigheh Dehghani

TL;DR

AurigaNet tackles real-time urban driving perception by unifying object detection, lane detection, and drivable-area instance segmentation in a single multi-task network. It uses a CSPDarknet-based shared encoder with SP-PF neck and three task-specific decoders, augmented by a discriminative embedding loss and deformable convolutions to enable end-to-end drivable-area instance segmentation. A mean shift clustering approach with von Mises-Fisher geometry clusters embeddings on a unit sphere, providing accurate instance separation without heavy post-processing. On the BDD100K dataset, AurigaNet achieves $IoU_{drivable}=85.2\%$, $IoU_{lane}=60.8\%$, and $mAP_{traffic}=47.6\%$, while maintaining real-time performance on embedded hardware such as the Jetson Orin NX.

Abstract

Self-driving cars hold significant potential to reduce traffic accidents, alleviate congestion, and enhance urban mobility. However, developing reliable AI systems for autonomous vehicles remains a substantial challenge. Over the past decade, multi-task learning has emerged as a powerful approach to address complex problems in driving perception. Multi-task networks offer several advantages, including increased computational efficiency, real-time processing capabilities, optimized resource utilization, and improved generalization. In this study, we present AurigaNet, an advanced multi-task network architecture designed to push the boundaries of autonomous driving perception. AurigaNet integrates three critical tasks: object detection, lane detection, and drivable area instance segmentation. The system is trained and evaluated using the BDD100K dataset, renowned for its diversity in driving conditions. Key innovations of AurigaNet include its end-to-end instance segmentation capability, which significantly enhances both accuracy and efficiency in path estimation for autonomous vehicles. Experimental results demonstrate that AurigaNet achieves an 85.2% IoU in drivable area segmentation, outperforming its closest competitor by 0.7%. In lane detection, AurigaNet achieves a remarkable 60.8% IoU, surpassing other models by more than 30%. Furthermore, the network achieves an mAP@0.5:0.95 of 47.6% in traffic object detection, exceeding the next leading model by 2.9%. Additionally, we validate the practical feasibility of AurigaNet by deploying it on embedded devices such as the Jetson Orin NX, where it demonstrates competitive real-time performance. These results underscore AurigaNet's potential as a robust and efficient solution for autonomous driving perception systems. The code can be found here https://github.com/KiaRational/AurigaNet.

AurigaNet: A Real-Time Multi-Task Network for Enhanced Urban Driving Perception

TL;DR

AurigaNet tackles real-time urban driving perception by unifying object detection, lane detection, and drivable-area instance segmentation in a single multi-task network. It uses a CSPDarknet-based shared encoder with SP-PF neck and three task-specific decoders, augmented by a discriminative embedding loss and deformable convolutions to enable end-to-end drivable-area instance segmentation. A mean shift clustering approach with von Mises-Fisher geometry clusters embeddings on a unit sphere, providing accurate instance separation without heavy post-processing. On the BDD100K dataset, AurigaNet achieves , , and , while maintaining real-time performance on embedded hardware such as the Jetson Orin NX.

Abstract

Self-driving cars hold significant potential to reduce traffic accidents, alleviate congestion, and enhance urban mobility. However, developing reliable AI systems for autonomous vehicles remains a substantial challenge. Over the past decade, multi-task learning has emerged as a powerful approach to address complex problems in driving perception. Multi-task networks offer several advantages, including increased computational efficiency, real-time processing capabilities, optimized resource utilization, and improved generalization. In this study, we present AurigaNet, an advanced multi-task network architecture designed to push the boundaries of autonomous driving perception. AurigaNet integrates three critical tasks: object detection, lane detection, and drivable area instance segmentation. The system is trained and evaluated using the BDD100K dataset, renowned for its diversity in driving conditions. Key innovations of AurigaNet include its end-to-end instance segmentation capability, which significantly enhances both accuracy and efficiency in path estimation for autonomous vehicles. Experimental results demonstrate that AurigaNet achieves an 85.2% IoU in drivable area segmentation, outperforming its closest competitor by 0.7%. In lane detection, AurigaNet achieves a remarkable 60.8% IoU, surpassing other models by more than 30%. Furthermore, the network achieves an mAP@0.5:0.95 of 47.6% in traffic object detection, exceeding the next leading model by 2.9%. Additionally, we validate the practical feasibility of AurigaNet by deploying it on embedded devices such as the Jetson Orin NX, where it demonstrates competitive real-time performance. These results underscore AurigaNet's potential as a robust and efficient solution for autonomous driving perception systems. The code can be found here https://github.com/KiaRational/AurigaNet.
Paper Structure (29 sections, 13 equations, 11 figures, 5 tables)

This paper contains 29 sections, 13 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: The inference result of AurigaNet
  • Figure 2: The architecture of AurigaNet
  • Figure 3: Illustration of offset field role in Deformable Convolution that helps to align with the shape of instances
  • Figure 4: Illustration of discriminative loss: same-instance features are clustered, while different instances are separated in feature space.
  • Figure 5: The fixed receptive field in standard convolution and the adaptive receptive field in deformable convolution are illustrated using two layers.
  • ...and 6 more figures