MODIPHY: Multimodal Obscured Detection for IoT using PHantom Convolution-Enabled Faster YOLO

Shubhabrata Mukherjee; Cory Beard; Zhu Li

MODIPHY: Multimodal Obscured Detection for IoT using PHantom Convolution-Enabled Faster YOLO

Shubhabrata Mukherjee, Cory Beard, Zhu Li

TL;DR

YOLO Phantom is introduced, one of the smallest YOLO models ever conceived, achieving comparable accuracy to the latest YOLOv8n model while simultaneously reducing both parameters and model size by 43\%, resulting in a significant 19\% reduction in Giga Floating-Point Operations (GFLOPs).

Abstract

Low-light conditions and occluded scenarios impede object detection in real-world Internet of Things (IoT) applications like autonomous vehicles and security systems. While advanced machine learning models strive for accuracy, their computational demands clash with the limitations of resource-constrained devices, hampering real-time performance. In our current research, we tackle this challenge, by introducing ``YOLO Phantom", one of the smallest YOLO models ever conceived. YOLO Phantom utilizes the novel Phantom Convolution block, achieving comparable accuracy to the latest YOLOv8n model while simultaneously reducing both parameters and model size by 43\%, resulting in a significant 19\% reduction in Giga Floating-Point Operations (GFLOPs). YOLO Phantom leverages transfer learning on our multimodal RGB-infrared dataset to address low-light and occlusion issues, equipping it with robust vision under adverse conditions. Its real-world efficacy is demonstrated on an IoT platform with advanced low-light and RGB cameras, seamlessly connecting to an AWS-based notification endpoint for efficient real-time object detection. Benchmarks reveal a substantial boost of 17\% and 14\% in frames per second (FPS) for thermal and RGB detection, respectively, compared to the baseline YOLOv8n model. For community contribution, both the code and the multimodal dataset are available on GitHub.

MODIPHY: Multimodal Obscured Detection for IoT using PHantom Convolution-Enabled Faster YOLO

TL;DR

Abstract

Paper Structure (16 sections, 4 equations, 10 figures, 4 tables)

This paper contains 16 sections, 4 equations, 10 figures, 4 tables.

Introduction
Implementation strategies
Role of resource optimization
Our approach: YOLO Phantom
Ultralytics YOLOv8 architecture
Architecture improvement for faster and better inference
Group Convolution
Depth-wise separable Convolution
Ghost Convolution
Experimental Setup
Dataset description and Training
Results and Analysis
Out of sample testing
Performance on different modality data
Cross-modality performance
...and 1 more sections

Figures (10)

Figure 1: Detection on a Rainy, Obscured Evening with Severe Occlusion Using a Multimodal YOLO Model and a NoIR Camera on a Raspberry Pi Platform
Figure 2: Size, Parameters, and GFLOP comparison of smaller YOLO models
Figure 3: Modified YOLOv8 Backbone RangeKingGitHub
Figure 4: Modified YOLOv8 Neck and Decoupled detection Head RangeKingGitHub
Figure 5: Phantom Convolution and C2fi Block Architecture
...and 5 more figures

MODIPHY: Multimodal Obscured Detection for IoT using PHantom Convolution-Enabled Faster YOLO

TL;DR

Abstract

MODIPHY: Multimodal Obscured Detection for IoT using PHantom Convolution-Enabled Faster YOLO

Authors

TL;DR

Abstract

Table of Contents

Figures (10)