Table of Contents
Fetching ...

Real Time Human Detection by Unmanned Aerial Vehicles

Walid Guettala, Ali Sayah, Laid Kahloul, Ahmed Tibermacine

TL;DR

The paper tackles real-time human detection from UAV-based thermal imagery, addressing small-object visibility and viewpoint variation. It proposes a UAV-perspective dataset and a YOLOv7-based detector enhanced with architectural reforms (ELAN/E-ELAN, transfer learning) to enlarge receptive fields for tiny objects. The model achieves a final mAP of $72.5\%$ at IOU $=0.5$ with approximately $161$ FPS on Google Colab, surpassing baselines in speed and showing robust cross-angle performance. The work demonstrates practical viability for real-time surveillance and provides a dataset and methods for future extension to other classes and tracking.

Abstract

One of the most important problems in computer vision and remote sensing is object detection, which identifies particular categories of diverse things in pictures. Two crucial data sources for public security are the thermal infrared (TIR) remote sensing multi-scenario photos and videos produced by unmanned aerial vehicles (UAVs). Due to the small scale of the target, complex scene information, low resolution relative to the viewable videos, and dearth of publicly available labeled datasets and training models, their object detection procedure is still difficult. A UAV TIR object detection framework for pictures and videos is suggested in this study. The Forward-looking Infrared (FLIR) cameras used to gather ground-based TIR photos and videos are used to create the ``You Only Look Once'' (YOLO) model, which is based on CNN architecture. Results indicated that in the validating task, detecting human object had an average precision at IOU (Intersection over Union) = 0.5, which was 72.5\%, using YOLOv7 (YOLO version 7) state of the art model \cite{1}, while the detection speed around 161 frames per second (FPS/second). The usefulness of the YOLO architecture is demonstrated in the application, which evaluates the cross-detection performance of people in UAV TIR videos under a YOLOv7 model in terms of the various UAVs' observation angles. The qualitative and quantitative evaluation of object detection from TIR pictures and videos using deep-learning models is supported favorably by this work.

Real Time Human Detection by Unmanned Aerial Vehicles

TL;DR

The paper tackles real-time human detection from UAV-based thermal imagery, addressing small-object visibility and viewpoint variation. It proposes a UAV-perspective dataset and a YOLOv7-based detector enhanced with architectural reforms (ELAN/E-ELAN, transfer learning) to enlarge receptive fields for tiny objects. The model achieves a final mAP of at IOU with approximately FPS on Google Colab, surpassing baselines in speed and showing robust cross-angle performance. The work demonstrates practical viability for real-time surveillance and provides a dataset and methods for future extension to other classes and tracking.

Abstract

One of the most important problems in computer vision and remote sensing is object detection, which identifies particular categories of diverse things in pictures. Two crucial data sources for public security are the thermal infrared (TIR) remote sensing multi-scenario photos and videos produced by unmanned aerial vehicles (UAVs). Due to the small scale of the target, complex scene information, low resolution relative to the viewable videos, and dearth of publicly available labeled datasets and training models, their object detection procedure is still difficult. A UAV TIR object detection framework for pictures and videos is suggested in this study. The Forward-looking Infrared (FLIR) cameras used to gather ground-based TIR photos and videos are used to create the ``You Only Look Once'' (YOLO) model, which is based on CNN architecture. Results indicated that in the validating task, detecting human object had an average precision at IOU (Intersection over Union) = 0.5, which was 72.5\%, using YOLOv7 (YOLO version 7) state of the art model \cite{1}, while the detection speed around 161 frames per second (FPS/second). The usefulness of the YOLO architecture is demonstrated in the application, which evaluates the cross-detection performance of people in UAV TIR videos under a YOLOv7 model in terms of the various UAVs' observation angles. The qualitative and quantitative evaluation of object detection from TIR pictures and videos using deep-learning models is supported favorably by this work.
Paper Structure (10 sections, 4 figures, 2 tables)

This paper contains 10 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: YOLO architecture
  • Figure 2: Precision, Recall, and mAP@0.5 on validation dataset during training
  • Figure 3: Comparison between our dataset sample and Model2 dataset sample
  • Figure 4: Sample images from the stream, labeled and predicted images