You Only Look at Once for Real-time and Generic Multi-Task

Jiayuan Wang; Q. M. Jonathan Wu; Ning Zhang

You Only Look at Once for Real-time and Generic Multi-Task

Jiayuan Wang, Q. M. Jonathan Wu, Ning Zhang

TL;DR

This paper introduces A-YOLOM, a real-time, lightweight multi-task framework for autonomous driving that jointly performs object detection, drivable area segmentation, and lane line segmentation within a single model. It features an adaptive concatenation module in the segmentation neck and a lightweight, unified segmentation head, coupled with a unified loss across segmentation tasks to improve generality and efficiency. On the BDD100K dataset and real-road tests, A-YOLOM achieves strong detection, high drivable area IoU, and competitive lane-line metrics while maintaining real-time throughput on edge-style hardware. The approach demonstrates that shared backbones with task-specific yet lightweight necks and heads, along with end-to-end training, can deliver panoptic driving perception with favorable speed, accuracy, and deployability.

Abstract

High precision, lightweight, and real-time responsiveness are three essential requirements for implementing autonomous driving. In this study, we incorporate A-YOLOM, an adaptive, real-time, and lightweight multi-task model designed to concurrently address object detection, drivable area segmentation, and lane line segmentation tasks. Specifically, we develop an end-to-end multi-task model with a unified and streamlined segmentation structure. We introduce a learnable parameter that adaptively concatenates features between necks and backbone in segmentation tasks, using the same loss function for all segmentation tasks. This eliminates the need for customizations and enhances the model's generalization capabilities. We also introduce a segmentation head composed only of a series of convolutional layers, which reduces the number of parameters and inference time. We achieve competitive results on the BDD100k dataset, particularly in visualization outcomes. The performance results show a mAP50 of 81.1% for object detection, a mIoU of 91.0% for drivable area segmentation, and an IoU of 28.8% for lane line segmentation. Additionally, we introduce real-world scenarios to evaluate our model's performance in a real scene, which significantly outperforms competitors. This demonstrates that our model not only exhibits competitive performance but is also more flexible and faster than existing multi-task models. The source codes and pre-trained models are released at https://github.com/JiayuanWang-JW/YOLOv8-multi-task

You Only Look at Once for Real-time and Generic Multi-Task

TL;DR

Abstract

You Only Look at Once for Real-time and Generic Multi-Task

Authors

TL;DR

Abstract

Table of Contents

Figures (7)