Table of Contents
Fetching ...

Object Detection for Vehicle Dashcams using Transformers

Osama Mustafa, Khizer Ali, Anam Bibi, Imran Siddiqi, Momina Moetesum

TL;DR

This work tackles robust object detection for vehicle dashcams under diverse real-world conditions by fine-tuning DETR on a real-world truck dashcam dataset. It leverages a ResNet-50 backbone with a transformer encoder–decoder head and a bipartite matching loss to exploit contextual information and avoid traditional NMS. The model achieves a mean average precision (mAP) of 0.95 at an IoU threshold of 0.50 on a Motive AI real-world dataset covering four classes (traffic signals, stop signs, cars, trucks), demonstrating strong performance under challenging lighting and occlusion. Overall, the study demonstrates the viability of transformer-based dashcam perception for enhanced road safety and autonomous driving support in trucking, with implications for scalable intelligent automation in fleet management.

Abstract

The use of intelligent automation is growing significantly in the automotive industry, as it assists drivers and fleet management companies, thus increasing their productivity. Dash cams are now been used for this purpose which enables the instant identification and understanding of multiple objects and occurrences in the surroundings. In this paper, we propose a novel approach for object detection in dashcams using transformers. Our system is based on the state-of-the-art DEtection TRansformer (DETR), which has demonstrated strong performance in a variety of conditions, including different weather and illumination scenarios. The use of transformers allows for the consideration of contextual information in decisionmaking, improving the accuracy of object detection. To validate our approach, we have trained our DETR model on a dataset that represents real-world conditions. Our results show that the use of intelligent automation through transformers can significantly enhance the capabilities of dashcam systems. The model achieves an mAP of 0.95 on detection.

Object Detection for Vehicle Dashcams using Transformers

TL;DR

This work tackles robust object detection for vehicle dashcams under diverse real-world conditions by fine-tuning DETR on a real-world truck dashcam dataset. It leverages a ResNet-50 backbone with a transformer encoder–decoder head and a bipartite matching loss to exploit contextual information and avoid traditional NMS. The model achieves a mean average precision (mAP) of 0.95 at an IoU threshold of 0.50 on a Motive AI real-world dataset covering four classes (traffic signals, stop signs, cars, trucks), demonstrating strong performance under challenging lighting and occlusion. Overall, the study demonstrates the viability of transformer-based dashcam perception for enhanced road safety and autonomous driving support in trucking, with implications for scalable intelligent automation in fleet management.

Abstract

The use of intelligent automation is growing significantly in the automotive industry, as it assists drivers and fleet management companies, thus increasing their productivity. Dash cams are now been used for this purpose which enables the instant identification and understanding of multiple objects and occurrences in the surroundings. In this paper, we propose a novel approach for object detection in dashcams using transformers. Our system is based on the state-of-the-art DEtection TRansformer (DETR), which has demonstrated strong performance in a variety of conditions, including different weather and illumination scenarios. The use of transformers allows for the consideration of contextual information in decisionmaking, improving the accuracy of object detection. To validate our approach, we have trained our DETR model on a dataset that represents real-world conditions. Our results show that the use of intelligent automation through transformers can significantly enhance the capabilities of dashcam systems. The model achieves an mAP of 0.95 on detection.
Paper Structure (11 sections, 3 equations, 6 figures, 3 tables)

This paper contains 11 sections, 3 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Sample images of dataset
  • Figure 2: Histogram representing class distribution in dataset
  • Figure 3: System Pipeline
  • Figure 4: Detailed Architecture of Encoder-Decoder Block
  • Figure 5: Training Loss and mAP Curves
  • ...and 1 more figures