Table of Contents
Fetching ...

POLO -- Point-based, multi-class animal detection

Giacomo May, Emanuele Dalsasso, Benjamin Kellenberger, Devis Tuia

TL;DR

This work tackles the annotation bottleneck in drone-based wildlife counting by proposing POLO, a point-label–trained, multi-class detector that modifies YOLOv8 to predict object centers rather than bounding boxes. By replacing IoU/DFL losses with point-appropriate alternatives and introducing DoR-based NMS, POLO achieves improved counting accuracy over a YOLOv8 baseline trained on pseudo-labels, while requiring only point annotations. Experiments on Izembek lagoon waterfowl demonstrate that POLO reduces counting error (MAE) across multiple species and tends to produce fewer false positives, albeit sometimes yielding more conservative counts. The approach offers a practical, scalable solution to wildlife censuses, with future work focused on hand-annotated box comparisons and robustness to diverse data acquisition conditions.

Abstract

Automated wildlife surveys based on drone imagery and object detection technology are a powerful and increasingly popular tool in conservation biology. Most detectors require training images with annotated bounding boxes, which are tedious, expensive, and not always unambiguous to create. To reduce the annotation load associated with this practice, we develop POLO, a multi-class object detection model that can be trained entirely on point labels. POLO is based on simple, yet effective modifications to the YOLOv8 architecture, including alterations to the prediction process, training losses, and post-processing. We test POLO on drone recordings of waterfowl containing up to multiple thousands of individual birds in one image and compare it to a regular YOLOv8. Our experiments show that at the same annotation cost, POLO achieves improved accuracy in counting animals in aerial imagery.

POLO -- Point-based, multi-class animal detection

TL;DR

This work tackles the annotation bottleneck in drone-based wildlife counting by proposing POLO, a point-label–trained, multi-class detector that modifies YOLOv8 to predict object centers rather than bounding boxes. By replacing IoU/DFL losses with point-appropriate alternatives and introducing DoR-based NMS, POLO achieves improved counting accuracy over a YOLOv8 baseline trained on pseudo-labels, while requiring only point annotations. Experiments on Izembek lagoon waterfowl demonstrate that POLO reduces counting error (MAE) across multiple species and tends to produce fewer false positives, albeit sometimes yielding more conservative counts. The approach offers a practical, scalable solution to wildlife censuses, with future work focused on hand-annotated box comparisons and robustness to diverse data acquisition conditions.

Abstract

Automated wildlife surveys based on drone imagery and object detection technology are a powerful and increasingly popular tool in conservation biology. Most detectors require training images with annotated bounding boxes, which are tedious, expensive, and not always unambiguous to create. To reduce the annotation load associated with this practice, we develop POLO, a multi-class object detection model that can be trained entirely on point labels. POLO is based on simple, yet effective modifications to the YOLOv8 architecture, including alterations to the prediction process, training losses, and post-processing. We test POLO on drone recordings of waterfowl containing up to multiple thousands of individual birds in one image and compare it to a regular YOLOv8. Our experiments show that at the same annotation cost, POLO achieves improved accuracy in counting animals in aerial imagery.

Paper Structure

This paper contains 14 sections, 1 equation, 2 figures, 3 tables.

Figures (2)

  • Figure 1: MAE scores achieved for the Brant goose class depending on radius and DoR value.
  • Figure 2: True- (columns 1 & 2) and false-positive (columns 3 & 4) detections obtained with YOLOv8 and POLO (magenta = Brant Goose, turquoise = Other, yellow = Gull).