Table of Contents
Fetching ...

MelNet: A Real-Time Deep Learning Algorithm for Object Detection

Yashar Azadvatan, Murat Kurt

TL;DR

This work introduces MelNet, a real-time single-stage object detector trained from scratch on the KITTI dataset. MelNet employs a 70-layer, YOLOv3-inspired architecture with dual-scale predictions and a $N \times N \times [2 \times (4 + 1 + 9)]$ output tensor to predict bounding boxes, objectness, and class scores, achieving $mAP$ of $0.732$ after 300 epochs on KITTI. Comparisons with pretrained baselines (YOLOv5, EfficientDet, Faster-RCNN-MobileNetv3) show that KITTI-only training can rival or surpass some models, while transfer learning remains beneficial in many scenarios. The results suggest MelNet’s potential for real-time deployment with a compact 72-layer architecture and point to future work involving larger, diverse datasets and integration of advanced techniques to further improve localization and accuracy.

Abstract

In this study, a novel deep learning algorithm for object detection, named MelNet, was introduced. MelNet underwent training utilizing the KITTI dataset for object detection. Following 300 training epochs, MelNet attained an mAP (mean average precision) score of 0.732. Additionally, three alternative models -YOLOv5, EfficientDet, and Faster-RCNN-MobileNetv3- were trained on the KITTI dataset and juxtaposed with MelNet for object detection. The outcomes underscore the efficacy of employing transfer learning in certain instances. Notably, preexisting models trained on prominent datasets (e.g., ImageNet, COCO, and Pascal VOC) yield superior results. Another finding underscores the viability of creating a new model tailored to a specific scenario and training it on a specific dataset. This investigation demonstrates that training MelNet exclusively on the KITTI dataset also surpasses EfficientDet after 150 epochs. Consequently, post-training, MelNet's performance closely aligns with that of other pre-trained models.

MelNet: A Real-Time Deep Learning Algorithm for Object Detection

TL;DR

This work introduces MelNet, a real-time single-stage object detector trained from scratch on the KITTI dataset. MelNet employs a 70-layer, YOLOv3-inspired architecture with dual-scale predictions and a output tensor to predict bounding boxes, objectness, and class scores, achieving of after 300 epochs on KITTI. Comparisons with pretrained baselines (YOLOv5, EfficientDet, Faster-RCNN-MobileNetv3) show that KITTI-only training can rival or surpass some models, while transfer learning remains beneficial in many scenarios. The results suggest MelNet’s potential for real-time deployment with a compact 72-layer architecture and point to future work involving larger, diverse datasets and integration of advanced techniques to further improve localization and accuracy.

Abstract

In this study, a novel deep learning algorithm for object detection, named MelNet, was introduced. MelNet underwent training utilizing the KITTI dataset for object detection. Following 300 training epochs, MelNet attained an mAP (mean average precision) score of 0.732. Additionally, three alternative models -YOLOv5, EfficientDet, and Faster-RCNN-MobileNetv3- were trained on the KITTI dataset and juxtaposed with MelNet for object detection. The outcomes underscore the efficacy of employing transfer learning in certain instances. Notably, preexisting models trained on prominent datasets (e.g., ImageNet, COCO, and Pascal VOC) yield superior results. Another finding underscores the viability of creating a new model tailored to a specific scenario and training it on a specific dataset. This investigation demonstrates that training MelNet exclusively on the KITTI dataset also surpasses EfficientDet after 150 epochs. Consequently, post-training, MelNet's performance closely aligns with that of other pre-trained models.
Paper Structure (14 sections, 9 figures, 5 tables)

This paper contains 14 sections, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Bounding boxes with dimension priors and location prediction Heetal2015.
  • Figure 2: MelNet layers.
  • Figure 3: MelNet architecture.
  • Figure 4: The platform is equipped with sensors in the top-left corner. The trajectory is derived from our visual odometry benchmark and is displayed in the top-center. The top-right corner shows the disparity and optical flow map. Finally, the bottom section displays the 3D object labels GeigerAreWR2012.
  • Figure 5: (a) Class Accuracy result of training MelNet. (b) No Object Accuracy result of training MelNet. (c) Object Accuracy result of training MelNet. (d) mAP result of training MelNet.
  • ...and 4 more figures