Object Detection for Vehicle Dashcams using Transformers
Osama Mustafa, Khizer Ali, Anam Bibi, Imran Siddiqi, Momina Moetesum
TL;DR
This work tackles robust object detection for vehicle dashcams under diverse real-world conditions by fine-tuning DETR on a real-world truck dashcam dataset. It leverages a ResNet-50 backbone with a transformer encoder–decoder head and a bipartite matching loss to exploit contextual information and avoid traditional NMS. The model achieves a mean average precision (mAP) of 0.95 at an IoU threshold of 0.50 on a Motive AI real-world dataset covering four classes (traffic signals, stop signs, cars, trucks), demonstrating strong performance under challenging lighting and occlusion. Overall, the study demonstrates the viability of transformer-based dashcam perception for enhanced road safety and autonomous driving support in trucking, with implications for scalable intelligent automation in fleet management.
Abstract
The use of intelligent automation is growing significantly in the automotive industry, as it assists drivers and fleet management companies, thus increasing their productivity. Dash cams are now been used for this purpose which enables the instant identification and understanding of multiple objects and occurrences in the surroundings. In this paper, we propose a novel approach for object detection in dashcams using transformers. Our system is based on the state-of-the-art DEtection TRansformer (DETR), which has demonstrated strong performance in a variety of conditions, including different weather and illumination scenarios. The use of transformers allows for the consideration of contextual information in decisionmaking, improving the accuracy of object detection. To validate our approach, we have trained our DETR model on a dataset that represents real-world conditions. Our results show that the use of intelligent automation through transformers can significantly enhance the capabilities of dashcam systems. The model achieves an mAP of 0.95 on detection.
