Comparative Analysis of YOLOv5, Faster R-CNN, SSD, and RetinaNet for Motorbike Detection in Kigali Autonomous Driving Context
Ngeyen Yinkfu, Sunday Nwovu, Jonathan Kayizzi, Angelique Uwamahoro
TL;DR
The paper addresses motorbike detection for autonomous navigation in Kigali's unstructured traffic by comparing four detectors (YOLOv5, Faster R-CNN, SSD, RetinaNet) on a small, COCO-formatted Kigali dataset using transfer learning in PyTorch. It analyzes accuracy, localization, and inference speed to determine suitability for real-time edge deployment in resource-limited settings. Key contributions include a cross-model performance comparison, discussion of dataset and model challenges in LMIC contexts, and recommendations for simplified architectures and edge-optimized solutions. The findings suggest YOLOv5 offers the best real-time applicability, RetinaNet provides strong localization, Faster R-CNN yields high precision with speed constraints, and SSD, while fast, struggles with accurate localization, highlighting the need for larger datasets and tailored architectures for safe autonomous navigation in developing countries.
Abstract
In Kigali, Rwanda, motorcycle taxis are a primary mode of transportation, often navigating unpredictably and disregarding traffic rules, posing significant challenges for autonomous driving systems. This study compares four object detection models--YOLOv5, Faster R-CNN, SSD, and RetinaNet--for motorbike detection using a custom dataset of 198 images collected in Kigali. Implemented in PyTorch with transfer learning, the models were evaluated for accuracy, localization, and inference speed to assess their suitability for real-time navigation in resource-constrained settings. We identify implementation challenges, including dataset limitations and model complexities, and recommend simplified architectures for future work to enhance accessibility for autonomous systems in developing countries like Rwanda.
