AurigaNet: A Real-Time Multi-Task Network for Enhanced Urban Driving Perception
Kiarash Ghasemzadeh, Sedigheh Dehghani
TL;DR
AurigaNet tackles real-time urban driving perception by unifying object detection, lane detection, and drivable-area instance segmentation in a single multi-task network. It uses a CSPDarknet-based shared encoder with SP-PF neck and three task-specific decoders, augmented by a discriminative embedding loss and deformable convolutions to enable end-to-end drivable-area instance segmentation. A mean shift clustering approach with von Mises-Fisher geometry clusters embeddings on a unit sphere, providing accurate instance separation without heavy post-processing. On the BDD100K dataset, AurigaNet achieves $IoU_{drivable}=85.2\%$, $IoU_{lane}=60.8\%$, and $mAP_{traffic}=47.6\%$, while maintaining real-time performance on embedded hardware such as the Jetson Orin NX.
Abstract
Self-driving cars hold significant potential to reduce traffic accidents, alleviate congestion, and enhance urban mobility. However, developing reliable AI systems for autonomous vehicles remains a substantial challenge. Over the past decade, multi-task learning has emerged as a powerful approach to address complex problems in driving perception. Multi-task networks offer several advantages, including increased computational efficiency, real-time processing capabilities, optimized resource utilization, and improved generalization. In this study, we present AurigaNet, an advanced multi-task network architecture designed to push the boundaries of autonomous driving perception. AurigaNet integrates three critical tasks: object detection, lane detection, and drivable area instance segmentation. The system is trained and evaluated using the BDD100K dataset, renowned for its diversity in driving conditions. Key innovations of AurigaNet include its end-to-end instance segmentation capability, which significantly enhances both accuracy and efficiency in path estimation for autonomous vehicles. Experimental results demonstrate that AurigaNet achieves an 85.2% IoU in drivable area segmentation, outperforming its closest competitor by 0.7%. In lane detection, AurigaNet achieves a remarkable 60.8% IoU, surpassing other models by more than 30%. Furthermore, the network achieves an mAP@0.5:0.95 of 47.6% in traffic object detection, exceeding the next leading model by 2.9%. Additionally, we validate the practical feasibility of AurigaNet by deploying it on embedded devices such as the Jetson Orin NX, where it demonstrates competitive real-time performance. These results underscore AurigaNet's potential as a robust and efficient solution for autonomous driving perception systems. The code can be found here https://github.com/KiaRational/AurigaNet.
