Psych-Occlusion: Using Visual Psychophysics for Aerial Detection of Occluded Persons during Search and Rescue
Arturo Miguel Russell Bernal, Jane Cleland-Huang, Walter Scheirer
TL;DR
This paper tackles reliable aerial detection of occluded persons in emergency response by integrating human perceptual data into computer vision. It introduces Psych-ER, a large-scale human behavioral dataset collected from NOMAD images via MTurk to quantify how humans locate occluded targets at varying distances, and uses these insights to derive a psychophysical loss for bounding-box regression. The loss uses a center-focused Gaussian penalty whose variance $\sigma(d,v)$ is informed by human performance via $\sigma(d,v) = 100 - \mathrm{mAP}@0.00(d,v)$, yielding a loss $human\_loss(d,v) = A \cdot human\_penalty(d,v) + B \cdot (1 - human\_penalty(d,v)) \cdot default\_loss$ with $human\_penalty(d,v) = 1 - \exp(-((x_{pred}-x_{gt})^2+(y_{pred}-y_{gt})^2)/(2\,\sigma(d,v)^2))$. Evaluated on NOMAD with RetinaNet-R101-FPN, the psychophysical loss improves performance at longer distances and under occlusion without compromising near-distance accuracy, while incurring minimal training overhead and no extra inference cost. The work provides two key contributions: the Psych-ER dataset of human search behavior for aerial occluded views and a human-guided localization loss formulation, representing a first step toward human-informed localization in ER-specific CV. These results have practical implications for deploying more robust onboard CV systems on sUAS in time-critical rescue missions.
Abstract
The success of Emergency Response (ER) scenarios, such as search and rescue, is often dependent upon the prompt location of a lost or injured person. With the increasing use of small Unmanned Aerial Systems (sUAS) as "eyes in the sky" during ER scenarios, efficient detection of persons from aerial views plays a crucial role in achieving a successful mission outcome. Fatigue of human operators during prolonged ER missions, coupled with limited human resources, highlights the need for sUAS equipped with Computer Vision (CV) capabilities to aid in finding the person from aerial views. However, the performance of CV models onboard sUAS substantially degrades under real-life rigorous conditions of a typical ER scenario, where person search is hampered by occlusion and low target resolution. To address these challenges, we extracted images from the NOMAD dataset and performed a crowdsource experiment to collect behavioural measurements when humans were asked to "find the person in the picture". We exemplify the use of our behavioral dataset, Psych-ER, by using its human accuracy data to adapt the loss function of a detection model. We tested our loss adaptation on a RetinaNet model evaluated on NOMAD against increasing distance and occlusion, with our psychophysical loss adaptation showing improvements over the baseline at higher distances across different levels of occlusion, without degrading performance at closer distances. To the best of our knowledge, our work is the first human-guided approach to address the location task of a detection model, while addressing real-world challenges of aerial search and rescue. All datasets and code can be found at: https://github.com/ArtRuss/NOMAD.
