Table of Contents
Fetching ...

Evaluating YOLO Architectures: Implications for Real-Time Vehicle Detection in Urban Environments of Bangladesh

Ha Meem Hossain, Pritam Nath, Mahitun Nesa Mahi, Imtiaz Uddin, Ishrat Jahan Eiste, Syed Nasibur Rahman Ratul, Md Naim Uddin Mozumdar, Asif Mohammed Saad, MD Tamim Hossain

TL;DR

This work addresses the gap in region-specific vehicle detection for Bangladesh by evaluating six YOLO variants (YOLOv8 and YOLOv11 across n/m/x) on a newly curated 29-class Bangladeshi vehicle dataset captured in urban traffic. It demonstrates that YOLOv11x achieves the highest accuracy with $mAP@0.5$ ≈ 0.637 and recall ≈ 0.614, but incurs higher latency (~$45.8$ ms per image), while YOLOv8m and YOLOv11m offer a practical balance with $mAP@0.5$ around 0.625–0.618 and ~14–15 ms inference. The results also reveal significant challenges for rare classes due to data imbalance and confusion between visually similar vehicles, underscoring the need for region-specific data and targeted model improvements for developing regions where generic models underperform. Overall, the study provides guidance for selecting YOLO variants that balance speed and accuracy in Bangladesh’s complex traffic environments and lays groundwork for further dataset and architectural enhancements.

Abstract

Vehicle detection systems trained on Non-Bangladeshi datasets struggle to accurately identify local vehicle types in Bangladesh's unique road environments, creating critical gaps in autonomous driving technology for developing regions. This study evaluates six YOLO model variants on a custom dataset featuring 29 distinct vehicle classes, including region-specific vehicles such as ``Desi Nosimon'', ``Leguna'', ``Battery Rickshaw'', and ``CNG''. The dataset comprises high-resolution images (1920x1080) captured across various Bangladeshi roads using mobile phone cameras and manually annotated using LabelImg with YOLO format bounding boxes. Performance evaluation revealed YOLOv11x as the top performer, achieving 63.7\% mAP@0.5, 43.8\% mAP@0.5:0.95, 61.4\% recall, and 61.6\% F1-score, though requiring 45.8 milliseconds per image for inference. Medium variants (YOLOv8m, YOLOv11m) struck an optimal balance, delivering robust detection performance with mAP@0.5 values of 62.5\% and 61.8\% respectively, while maintaining moderate inference times around 14-15 milliseconds. The study identified significant detection challenges for rare vehicle classes, with Construction Vehicles and Desi Nosimons showing near-zero accuracy due to dataset imbalances and insufficient training samples. Confusion matrices revealed frequent misclassifications between visually similar vehicles, particularly Mini Trucks versus Mini Covered Vans. This research provides a foundation for developing robust object detection systems specifically adapted to Bangladesh traffic conditions, addressing critical needs in autonomous vehicle technology advancement for developing regions where conventional generic-trained models fail to perform adequately.

Evaluating YOLO Architectures: Implications for Real-Time Vehicle Detection in Urban Environments of Bangladesh

TL;DR

This work addresses the gap in region-specific vehicle detection for Bangladesh by evaluating six YOLO variants (YOLOv8 and YOLOv11 across n/m/x) on a newly curated 29-class Bangladeshi vehicle dataset captured in urban traffic. It demonstrates that YOLOv11x achieves the highest accuracy with ≈ 0.637 and recall ≈ 0.614, but incurs higher latency (~ ms per image), while YOLOv8m and YOLOv11m offer a practical balance with around 0.625–0.618 and ~14–15 ms inference. The results also reveal significant challenges for rare classes due to data imbalance and confusion between visually similar vehicles, underscoring the need for region-specific data and targeted model improvements for developing regions where generic models underperform. Overall, the study provides guidance for selecting YOLO variants that balance speed and accuracy in Bangladesh’s complex traffic environments and lays groundwork for further dataset and architectural enhancements.

Abstract

Vehicle detection systems trained on Non-Bangladeshi datasets struggle to accurately identify local vehicle types in Bangladesh's unique road environments, creating critical gaps in autonomous driving technology for developing regions. This study evaluates six YOLO model variants on a custom dataset featuring 29 distinct vehicle classes, including region-specific vehicles such as ``Desi Nosimon'', ``Leguna'', ``Battery Rickshaw'', and ``CNG''. The dataset comprises high-resolution images (1920x1080) captured across various Bangladeshi roads using mobile phone cameras and manually annotated using LabelImg with YOLO format bounding boxes. Performance evaluation revealed YOLOv11x as the top performer, achieving 63.7\% mAP@0.5, 43.8\% mAP@0.5:0.95, 61.4\% recall, and 61.6\% F1-score, though requiring 45.8 milliseconds per image for inference. Medium variants (YOLOv8m, YOLOv11m) struck an optimal balance, delivering robust detection performance with mAP@0.5 values of 62.5\% and 61.8\% respectively, while maintaining moderate inference times around 14-15 milliseconds. The study identified significant detection challenges for rare vehicle classes, with Construction Vehicles and Desi Nosimons showing near-zero accuracy due to dataset imbalances and insufficient training samples. Confusion matrices revealed frequent misclassifications between visually similar vehicles, particularly Mini Trucks versus Mini Covered Vans. This research provides a foundation for developing robust object detection systems specifically adapted to Bangladesh traffic conditions, addressing critical needs in autonomous vehicle technology advancement for developing regions where conventional generic-trained models fail to perform adequately.

Paper Structure

This paper contains 13 sections, 4 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: YOLO Model Architecture
  • Figure 2: Evaluation of 6 variants of YOLO models across 4 different metrics
  • Figure 3: Confusion Matrix For YOLO models used, showing Correct Prediction of Classes (Diagonal), And Missclassification(Off-diagonal)
  • Figure 4: Visualizations for six YOLO models, showing detected objects with bounding boxes, labels, and their confidence levels