Table of Contents
Fetching ...

Automatic Vehicle Detection using DETR: A Transformer-Based Approach for Navigating Treacherous Roads

Istiaq Ahmed Fahad, Abdullah Ibne Hanif Arean, Nazmus Sakib Ahmed, Mahmudul Hasan

TL;DR

This work addresses automatic vehicle detection in diverse driving environments by applying a Transformer-based DETR approach augmented with Collaborative Hybrid Assignments Training (Co-DETR) to the BadODD dataset from Bangladesh. It compares Co-DETR against YOLOv8m, showing that the transformer-based method yields higher detection accuracy (peak mAP of 0.438 at 9 epochs) and better robustness on treacherous roads. The study details dataset characteristics, preprocessing, and model configurations, and demonstrates the practical potential of DETR for autonomous navigation in complex real-world settings. The findings suggest transformer-based detection with Co-DETR as a viable path for reliable AVD, with future work targeting real-time deployment and hybrid architectures.

Abstract

Automatic Vehicle Detection (AVD) in diverse driving environments presents unique challenges due to varying lighting conditions, road types, and vehicle types. Traditional methods, such as YOLO and Faster R-CNN, often struggle to cope with these complexities. As computer vision evolves, combining Convolutional Neural Networks (CNNs) with Transformer-based approaches offers promising opportunities for improving detection accuracy and efficiency. This study is the first to experiment with Detection Transformer (DETR) for automatic vehicle detection in complex and varied settings. We employ a Collaborative Hybrid Assignments Training scheme, Co-DETR, to enhance feature learning and attention mechanisms in DETR. By leveraging versatile label assignment strategies and introducing multiple parallel auxiliary heads, we provide more effective supervision during training and extract positive coordinates to boost training efficiency. Through extensive experiments on DETR variants and YOLO models, conducted using the BadODD dataset, we demonstrate the advantages of our approach. Our method achieves superior results, and improved accuracy in diverse conditions, making it practical for real-world deployment. This work significantly advances autonomous navigation technology and opens new research avenues in object detection for autonomous vehicles. By integrating the strengths of CNNs and Transformers, we highlight the potential of DETR for robust and efficient vehicle detection in challenging driving environments.

Automatic Vehicle Detection using DETR: A Transformer-Based Approach for Navigating Treacherous Roads

TL;DR

This work addresses automatic vehicle detection in diverse driving environments by applying a Transformer-based DETR approach augmented with Collaborative Hybrid Assignments Training (Co-DETR) to the BadODD dataset from Bangladesh. It compares Co-DETR against YOLOv8m, showing that the transformer-based method yields higher detection accuracy (peak mAP of 0.438 at 9 epochs) and better robustness on treacherous roads. The study details dataset characteristics, preprocessing, and model configurations, and demonstrates the practical potential of DETR for autonomous navigation in complex real-world settings. The findings suggest transformer-based detection with Co-DETR as a viable path for reliable AVD, with future work targeting real-time deployment and hybrid architectures.

Abstract

Automatic Vehicle Detection (AVD) in diverse driving environments presents unique challenges due to varying lighting conditions, road types, and vehicle types. Traditional methods, such as YOLO and Faster R-CNN, often struggle to cope with these complexities. As computer vision evolves, combining Convolutional Neural Networks (CNNs) with Transformer-based approaches offers promising opportunities for improving detection accuracy and efficiency. This study is the first to experiment with Detection Transformer (DETR) for automatic vehicle detection in complex and varied settings. We employ a Collaborative Hybrid Assignments Training scheme, Co-DETR, to enhance feature learning and attention mechanisms in DETR. By leveraging versatile label assignment strategies and introducing multiple parallel auxiliary heads, we provide more effective supervision during training and extract positive coordinates to boost training efficiency. Through extensive experiments on DETR variants and YOLO models, conducted using the BadODD dataset, we demonstrate the advantages of our approach. Our method achieves superior results, and improved accuracy in diverse conditions, making it practical for real-world deployment. This work significantly advances autonomous navigation technology and opens new research avenues in object detection for autonomous vehicles. By integrating the strengths of CNNs and Transformers, we highlight the potential of DETR for robust and efficient vehicle detection in challenging driving environments.

Paper Structure

This paper contains 13 sections, 7 figures, 1 table.

Figures (7)

  • Figure 1: Districts of Bangladesh from where the data for BadODD dataset is collected
  • Figure 2: Place wise Train-Test Data Distribution
  • Figure 3: Class Distribution of BadODD Dataset
  • Figure 4: Examples from the dataset illustrating diverse road conditions and vehicle types under different lighting scenarios. The top row (a-c) depicts daytime scenes, while the bottom row (d-f) showcases nighttime scenes. This dataset is intended for studying diverse traffic patterns and vehicle behavior in Bangladesh.
  • Figure 5: Comparison of image enhancement techniques applied to a raw street scene. (a) Raw image, (b) Image after Histogram Equalization, (c) Image after Contrast Limited Adaptive Histogram Equalization (CLAHE), and (d) Image after Gamma Correction. These techniques improve visual clarity and highlight different features in the dataset.
  • ...and 2 more figures