Infra-YOLO: Efficient Neural Network Structure with Model Compression for Real-Time Infrared Small Object Detection

Zhonglin Chen; Anyu Geng; Jianan Jiang; Jiwu Lu; Di Wu

Infra-YOLO: Efficient Neural Network Structure with Model Compression for Real-Time Infrared Small Object Detection

Zhonglin Chen, Anyu Geng, Jianan Jiang, Jiwu Lu, Di Wu

TL;DR

This work tackles infrared small object detection by introducing the InfraTiny dataset and an enhanced YOLO-based detector, Infra-YOLO. It presents two novel modules, MSAM for multi-scale attention and FFAFPM for robust feature fusion, to improve small-target detection under low SNR. Channel pruning and knowledge distillation are employed to enable efficient deployment on embedded UAV hardware without sacrificing accuracy. Experiments show consistent gains over baselines on InfraTiny and demonstrate practical gains in speed and compression, highlighting the method's suitability for edge-enabled infrared sensing systems.

Abstract

Although convolutional neural networks have made outstanding achievements in visible light target detection, there are still many challenges in infrared small object detection because of the low signal-to-noise ratio, incomplete object structure, and a lack of reliable infrared small object dataset. To resolve limitations of the infrared small object dataset, a new dataset named InfraTiny was constructed, and more than 85% bounding box is less than 32x32 pixels (3218 images and a total of 20,893 bounding boxes). A multi-scale attention mechanism module (MSAM) and a Feature Fusion Augmentation Pyramid Module (FFAFPM) were proposed and deployed onto embedded devices. The MSAM enables the network to obtain scale perception information by acquiring different receptive fields, while the background noise information is suppressed to enhance feature extraction ability. The proposed FFAFPM can enrich semantic information, and enhance the fusion of shallow feature and deep feature, thus false positive results have been significantly reduced. By integrating the proposed methods into the YOLO model, which is named Infra-YOLO, infrared small object detection performance has been improved. Compared to yolov3, mAP@0.5 has been improved by 2.7%; and compared to yolov4, that by 2.5% on the InfraTiny dataset. The proposed Infra-YOLO was also transferred onto the embedded device in the unmanned aerial vehicle (UAV) for real application scenarios, where the channel pruning method is adopted to reduce FLOPs and to achieve a tradeoff between speed and accuracy. Even if the parameters of Infra-YOLO are reduced by 88% with the pruning method, a gain of 0.7% is still achieved on mAP@0.5 compared to yolov3, and a gain of 0.5% compared to yolov4. Experimental results show that the proposed MSAM and FFAFPM method can improve infrared small object detection performance compared with the previous benchmark method.

Infra-YOLO: Efficient Neural Network Structure with Model Compression for Real-Time Infrared Small Object Detection

TL;DR

Abstract

Paper Structure (19 sections, 4 equations, 13 figures, 7 tables, 3 algorithms)

This paper contains 19 sections, 4 equations, 13 figures, 7 tables, 3 algorithms.

Introduction
RELATED WORKS
Small Object Detection
CNN Acceleration
THE PROPOSED METHOD
InfraTiny Dataset
Network Architecture
Multi-scale Attention Mechanism
Feature Fusion Augmentation Feature Pyramid Module
Channel Prune
EXPERIMENTS
Data Augmentation
Implementation Details
Experiment Results
Ablation Studies
...and 4 more sections

Figures (13)

Figure 1: Some sample annotations from the InfraTiny dataset. The InfraTiny dataset contains 3218 images with 480×360, a total of 20,893 targets. And 17896 targets smaller than 32x32 pixels, 5583 objects smaller than 9 x 9 pixels. Annotation categories: person and car. The green color boxes represent car; the blue color boxes represent person.
Figure 2: The normalized distribution of the width and height of all annotated bounding boxes in the InfraTiny dataset.
Figure 3: The pipeline of proposed infrared small target detection network (Infra-YOLO). The detector belongs to the one-stage detector, and its network structure is divided into three parts: backbone, neck, and head.
Figure 4: Illustration of MSAM framework. The proposed MSAM consists of spatial attention mechanism and channel attention mechanism. In the upper part of the figure, spatial attention mechanism is used to obtain multi-scale key information through dilated convolution with different dilation rates. The lower part of the figure is divided into channel attention mechanism, which is composed of adaptive pooling, one-dimensional convolution and Sigmoid, aiming at modeling different feature relations.
Figure 5: Schematic diagram of FFAFPM. The role of FFA is to enrich the semantic information of subsequent networks. FFAFPM has a top-down feature fusion path and a bottom-up feature fusion path. In the bottom-up path, to compensate for feature loss due to network depth, FFAFPM adds a cross-scale connection from the backbone to the output. In the entire neck stage, there is only one input node, which is cut to reduce the number of FLOPs.
...and 8 more figures

Infra-YOLO: Efficient Neural Network Structure with Model Compression for Real-Time Infrared Small Object Detection

TL;DR

Abstract

Infra-YOLO: Efficient Neural Network Structure with Model Compression for Real-Time Infrared Small Object Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (13)