Table of Contents
Fetching ...

Enhancing people localisation in drone imagery for better crowd management by utilising every pixel in high-resolution images

Bartosz Ptak, Marek Kraft

TL;DR

This work tackles the challenge of localising extremely small people in high-resolution drone imagery for crowd management. It introduces Dot localisation, a point-based localisation framework, augmented by the Pixel Distill (PD) module to exploit every pixel in full-resolution images, and a new UP-COUNT dataset with moving-camera drone footage and altitude metadata. Across DroneCrowd and UP-COUNT, the proposed approach achieves state-of-the-art localisation metrics (L-mAP and L-AP) while reducing model size and increasing inference speed, demonstrating strong real-world applicability. The UP-COUNT dataset and the Dot-PD methodology offer a robust foundation for accurate crowd counting and tracking in dynamic aerial environments, with future work focusing on tracking and altitude-aware scale handling.

Abstract

Accurate people localisation using drones is crucial for effective crowd management, not only during massive events and public gatherings but also for monitoring daily urban crowd flow. Traditional methods for tiny object localisation using high-resolution drone imagery often face limitations in precision and efficiency, primarily due to constraints in image scaling and sliding window techniques. To address these challenges, a novel approach dedicated to point-oriented object localisation is proposed. Along with this approach, the Pixel Distill module is introduced to enhance the processing of high-definition images by extracting spatial information from individual pixels at once. Additionally, a new dataset named UP-COUNT, tailored to contemporary drone applications, is shared. It addresses a wide range of challenges in drone imagery, such as simultaneous camera and object movement during the image acquisition process, pushing forward the capabilities of crowd management applications. A comprehensive evaluation of the proposed method on the proposed dataset and the commonly used DroneCrowd dataset demonstrates the superiority of our approach over existing methods and highlights its efficacy in drone-based crowd object localisation tasks. These improvements markedly increase the algorithm's applicability to operate in real-world scenarios, enabling more reliable localisation and counting of individuals in dynamic environments.

Enhancing people localisation in drone imagery for better crowd management by utilising every pixel in high-resolution images

TL;DR

This work tackles the challenge of localising extremely small people in high-resolution drone imagery for crowd management. It introduces Dot localisation, a point-based localisation framework, augmented by the Pixel Distill (PD) module to exploit every pixel in full-resolution images, and a new UP-COUNT dataset with moving-camera drone footage and altitude metadata. Across DroneCrowd and UP-COUNT, the proposed approach achieves state-of-the-art localisation metrics (L-mAP and L-AP) while reducing model size and increasing inference speed, demonstrating strong real-world applicability. The UP-COUNT dataset and the Dot-PD methodology offer a robust foundation for accurate crowd counting and tracking in dynamic aerial environments, with future work focusing on tracking and altitude-aware scale handling.

Abstract

Accurate people localisation using drones is crucial for effective crowd management, not only during massive events and public gatherings but also for monitoring daily urban crowd flow. Traditional methods for tiny object localisation using high-resolution drone imagery often face limitations in precision and efficiency, primarily due to constraints in image scaling and sliding window techniques. To address these challenges, a novel approach dedicated to point-oriented object localisation is proposed. Along with this approach, the Pixel Distill module is introduced to enhance the processing of high-definition images by extracting spatial information from individual pixels at once. Additionally, a new dataset named UP-COUNT, tailored to contemporary drone applications, is shared. It addresses a wide range of challenges in drone imagery, such as simultaneous camera and object movement during the image acquisition process, pushing forward the capabilities of crowd management applications. A comprehensive evaluation of the proposed method on the proposed dataset and the commonly used DroneCrowd dataset demonstrates the superiority of our approach over existing methods and highlights its efficacy in drone-based crowd object localisation tasks. These improvements markedly increase the algorithm's applicability to operate in real-world scenarios, enabling more reliable localisation and counting of individuals in dynamic environments.

Paper Structure

This paper contains 21 sections, 1 equation, 6 figures, 4 tables.

Figures (6)

  • Figure 1: A challenging example of tiny people localisation from UAV footage. While existing state-of-the-art methods use Gaussian heatmaps to generate predictions, our Dot approach determines precise people locations.
  • Figure 2: Example images with drawn head labels from UP-COUNT dataset. Top - the lowest altitude; middle - the highest; bottom - the maximum crowd image.
  • Figure 3: Architecture of the designed Dot localisation approach. The Pixel Distill (PD) module processes a full-resolution image and extracts information from each pixel, downsampling by two (for DroneCrowd) or four (for UP-COUNT) times, depending on the PD block used. Since UP-COUNT's resolution is double, the PD block requires an additional resolution reduction.
  • Figure 4: Visual comparison of results provided by different methods for both datasets.
  • Figure 5: L-AP metric comparison regarding the correctness threshold. The range of thresholds covers the L-mAP metric, between 1 and 25 pixels.
  • ...and 1 more figures