First qualitative observations on deep learning vision model YOLO and DETR for automated driving in Austria
Stefan Schoder
TL;DR
This paper presents a qualitative study of fast deep-learning vision models for automated driving, comparing YOLO variants (v2, v3, v5, v8) and RT-DETR on US and Austrian road scenes. It highlights the strengths of these models in detecting common objects like cars, while exposing weaknesses in small object, traffic-sign, and winter-scene recognition, particularly under alpine conditions and snow-occluded signs. The work emphasizes the need for region-specific fine-tuning, robust data, and potential benefits from multi-modal sensor fusion to improve safety-critical perception. The findings establish a foundation for subsequent quantitative benchmarks and inform the development of robust, region-aware ADAS and autonomous driving systems on diverse road networks.
Abstract
This study investigates the application of single and two-stage 2D-object detection algorithms like You Only Look Once (YOLO), Real-Time DEtection TRansformer (RT-DETR) algorithm for automated object detection to enhance road safety for autonomous driving on Austrian roads. The YOLO algorithm is a state-of-the-art real-time object detection system known for its efficiency and accuracy. In the context of driving, its potential to rapidly identify and track objects is crucial for advanced driver assistance systems (ADAS) and autonomous vehicles. The research focuses on the unique challenges posed by the road conditions and traffic scenarios in Austria. The country's diverse landscape, varying weather conditions, and specific traffic regulations necessitate a tailored approach for reliable object detection. The study utilizes a selective dataset comprising images and videos captured on Austrian roads, encompassing urban, rural, and alpine environments.
