Object Detection in Thermal Images Using Deep Learning for Unmanned Aerial Vehicles
Minh Dang Tu, Kieu Trang Le, Manh Duong Phung
TL;DR
The paper addresses small-object detection in thermal UAV imagery by proposing a lightweight, YOLOv5-based network augmented with a transformer encoder, Bi-FPN neck, and sliding-window attention. It introduces four prediction heads to improve tiny-object localization, along with GhostConv-based backbones and Bottleneck blocks to reduce parameters for embedded deployment. Key contributions include the integration of attention blocks before the prediction head, sliding-window self-attention to enrich features, and empirical demonstrations that the model outperforms several state-of-the-art baselines on VEDAI and collected datasets, with real-time performance on Jetson AGX. The work facilitates practical UAV applications by delivering higher accuracy at small scales while maintaining efficiency suitable for onboard processing.
Abstract
This work presents a neural network model capable of recognizing small and tiny objects in thermal images collected by unmanned aerial vehicles. Our model consists of three parts, the backbone, the neck, and the prediction head. The backbone is developed based on the structure of YOLOv5 combined with the use of a transformer encoder at the end. The neck includes a BI-FPN block combined with the use of a sliding window and a transformer to increase the information fed into the prediction head. The prediction head carries out the detection by evaluating feature maps with the Sigmoid function. The use of transformers with attention and sliding windows increases recognition accuracy while keeping the model at a reasonable number of parameters and computation requirements for embedded systems. Experiments conducted on public dataset VEDAI and our collected datasets show that our model has a higher accuracy than state-of-the-art methods such as ResNet, Faster RCNN, ComNet, ViT, YOLOv5, SMPNet, and DPNetV3. Experiments on the embedded computer Jetson AGX show that our model achieves a real-time computation speed with a stability rate of over 90%.
