MelNet: A Real-Time Deep Learning Algorithm for Object Detection
Yashar Azadvatan, Murat Kurt
TL;DR
This work introduces MelNet, a real-time single-stage object detector trained from scratch on the KITTI dataset. MelNet employs a 70-layer, YOLOv3-inspired architecture with dual-scale predictions and a $N \times N \times [2 \times (4 + 1 + 9)]$ output tensor to predict bounding boxes, objectness, and class scores, achieving $mAP$ of $0.732$ after 300 epochs on KITTI. Comparisons with pretrained baselines (YOLOv5, EfficientDet, Faster-RCNN-MobileNetv3) show that KITTI-only training can rival or surpass some models, while transfer learning remains beneficial in many scenarios. The results suggest MelNet’s potential for real-time deployment with a compact 72-layer architecture and point to future work involving larger, diverse datasets and integration of advanced techniques to further improve localization and accuracy.
Abstract
In this study, a novel deep learning algorithm for object detection, named MelNet, was introduced. MelNet underwent training utilizing the KITTI dataset for object detection. Following 300 training epochs, MelNet attained an mAP (mean average precision) score of 0.732. Additionally, three alternative models -YOLOv5, EfficientDet, and Faster-RCNN-MobileNetv3- were trained on the KITTI dataset and juxtaposed with MelNet for object detection. The outcomes underscore the efficacy of employing transfer learning in certain instances. Notably, preexisting models trained on prominent datasets (e.g., ImageNet, COCO, and Pascal VOC) yield superior results. Another finding underscores the viability of creating a new model tailored to a specific scenario and training it on a specific dataset. This investigation demonstrates that training MelNet exclusively on the KITTI dataset also surpasses EfficientDet after 150 epochs. Consequently, post-training, MelNet's performance closely aligns with that of other pre-trained models.
