MelNet: A Real-Time Deep Learning Algorithm for Object Detection

Yashar Azadvatan; Murat Kurt

MelNet: A Real-Time Deep Learning Algorithm for Object Detection

Yashar Azadvatan, Murat Kurt

TL;DR

This work introduces MelNet, a real-time single-stage object detector trained from scratch on the KITTI dataset. MelNet employs a 70-layer, YOLOv3-inspired architecture with dual-scale predictions and a $N \times N \times [2 \times (4 + 1 + 9)]$ output tensor to predict bounding boxes, objectness, and class scores, achieving $mAP$ of $0.732$ after 300 epochs on KITTI. Comparisons with pretrained baselines (YOLOv5, EfficientDet, Faster-RCNN-MobileNetv3) show that KITTI-only training can rival or surpass some models, while transfer learning remains beneficial in many scenarios. The results suggest MelNet’s potential for real-time deployment with a compact 72-layer architecture and point to future work involving larger, diverse datasets and integration of advanced techniques to further improve localization and accuracy.

Abstract

In this study, a novel deep learning algorithm for object detection, named MelNet, was introduced. MelNet underwent training utilizing the KITTI dataset for object detection. Following 300 training epochs, MelNet attained an mAP (mean average precision) score of 0.732. Additionally, three alternative models -YOLOv5, EfficientDet, and Faster-RCNN-MobileNetv3- were trained on the KITTI dataset and juxtaposed with MelNet for object detection. The outcomes underscore the efficacy of employing transfer learning in certain instances. Notably, preexisting models trained on prominent datasets (e.g., ImageNet, COCO, and Pascal VOC) yield superior results. Another finding underscores the viability of creating a new model tailored to a specific scenario and training it on a specific dataset. This investigation demonstrates that training MelNet exclusively on the KITTI dataset also surpasses EfficientDet after 150 epochs. Consequently, post-training, MelNet's performance closely aligns with that of other pre-trained models.

MelNet: A Real-Time Deep Learning Algorithm for Object Detection

TL;DR

output tensor to predict bounding boxes, objectness, and class scores, achieving

after 300 epochs on KITTI. Comparisons with pretrained baselines (YOLOv5, EfficientDet, Faster-RCNN-MobileNetv3) show that KITTI-only training can rival or surpass some models, while transfer learning remains beneficial in many scenarios. The results suggest MelNet’s potential for real-time deployment with a compact 72-layer architecture and point to future work involving larger, diverse datasets and integration of advanced techniques to further improve localization and accuracy.

Abstract

Paper Structure (14 sections, 9 figures, 5 tables)

This paper contains 14 sections, 9 figures, 5 tables.

Introduction
Related Works
Methodology
Materials and Experiments
Dataset Selection
Implementation Details
Training Process
Data Augmentation
Results
Metrics Evaluation Results
Prediction Result
Training Time Comparison Results
Number of Layers Comparison
Conclusion and Future Works

Figures (9)

Figure 1: Bounding boxes with dimension priors and location prediction Heetal2015.
Figure 2: MelNet layers.
Figure 3: MelNet architecture.
Figure 4: The platform is equipped with sensors in the top-left corner. The trajectory is derived from our visual odometry benchmark and is displayed in the top-center. The top-right corner shows the disparity and optical flow map. Finally, the bottom section displays the 3D object labels GeigerAreWR2012.
Figure 5: (a) Class Accuracy result of training MelNet. (b) No Object Accuracy result of training MelNet. (c) Object Accuracy result of training MelNet. (d) mAP result of training MelNet.
...and 4 more figures

MelNet: A Real-Time Deep Learning Algorithm for Object Detection

TL;DR

Abstract

MelNet: A Real-Time Deep Learning Algorithm for Object Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (9)