UAV-Based Human Body Detector Selection and Fusion for Geolocated Saliency Map Generation

Piotr Rudol; Patrick Doherty; Mariusz Wzorek; Chattrakul Sombattheera

UAV-Based Human Body Detector Selection and Fusion for Geolocated Saliency Map Generation

Piotr Rudol, Patrick Doherty, Mariusz Wzorek, Chattrakul Sombattheera

TL;DR

Geolocating objects in outdoor SAR missions using multi-UAV teams under bandwidth and compute constraints is challenging. The paper proposes an end-to-end solution comprising offline evaluation of vision detectors across codecs, an ILP-based detector allocation strategy, and a probabilistic fusion framework to generate saliency maps that yield 3D salient locations. Key contributions include a detector evaluation protocol with AP/AR and LRP metrics under limited bitrate, a formal ILP model with variables $x_{vdb}$ and $y_{vdf}$ and objective $acc_{obj}$, and a log-odds grid fusion approach that accounts for detector reliability and geometry. The approach enables timely, reliable SAR operations by optimally distributing detectors across a heterogeneous UAV network and fusing detections into actionable, geolocated saliency maps, validated through simulations and real flights.

Abstract

The problem of reliably detecting and geolocating objects of different classes in soft real-time is essential in many application areas, such as Search and Rescue performed using Unmanned Aerial Vehicles (UAVs). This research addresses the complementary problems of system contextual vision-based detector selection, allocation, and execution, in addition to the fusion of detection results from teams of UAVs for the purpose of accurately and reliably geolocating objects of interest in a timely manner. In an offline step, an application-independent evaluation of vision-based detectors from a system perspective is first performed. Based on this evaluation, the most appropriate algorithms for online object detection for each platform are selected automatically before a mission, taking into account a number of practical system considerations, such as the available communication links, video compression used, and the available computational resources. The detection results are fused using a method for building maps of salient locations which takes advantage of a novel sensor model for vision-based detections for both positive and negative observations. A number of simulated and real flight experiments are also presented, validating the proposed method.

UAV-Based Human Body Detector Selection and Fusion for Geolocated Saliency Map Generation

TL;DR

and

and objective

, and a log-odds grid fusion approach that accounts for detector reliability and geometry. The approach enables timely, reliable SAR operations by optimally distributing detectors across a heterogeneous UAV network and fusing detections into actionable, geolocated saliency maps, validated through simulations and real flights.

Abstract

Paper Structure (28 sections, 27 equations, 19 figures, 8 tables, 3 algorithms)

This paper contains 28 sections, 27 equations, 19 figures, 8 tables, 3 algorithms.

Introduction
Background
Contributions
Structure of the paper
Related work
Evaluation of vision-based object detectors
Detector evaluation procedure
Network architectures and configurations
Evaluation dataset
Video compression techniques
Detection performance evaluation
Detector selection
Problem statement objective
Problem statement setup
Integer programming formulation
...and 13 more sections

Figures (19)

Figure 1: Overview of the main components of the proposed system. Video streams from one or more UAV platforms are processed using vision-based detection algorithms (selected optimally) to produce results in the form of bounding boxes and confidence scores. The detections are fused in the form of a map which represents probabilities of locations containing objects of specific classes. Based on this map, a list of salient locations is computed in the form of 3D locations.
Figure 2: Example frames from the evaluation sequence showing degradation of image quality when using different video encodings and a low bitrate of 50kbps, from the right: H.265, VP9, H.264.
Figure 3: Overview of the system components related to local versus remote processing. Left: computational resources available to a UAV platform. Right: processes involved in local (orange and green) and remote processing.
Figure 4: Network configurations used for evaluation.
Figure 5: Summary of video sequences and example images used for evaluation: four locations (T, M, V, K), different weather conditions and times of year, percentage size composition (small, medium, and large) and total number of objects of class person.
...and 14 more figures

UAV-Based Human Body Detector Selection and Fusion for Geolocated Saliency Map Generation

TL;DR

Abstract

UAV-Based Human Body Detector Selection and Fusion for Geolocated Saliency Map Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (19)