Table of Contents
Fetching ...

DEAL-YOLO: Drone-based Efficient Animal Localization using YOLO

Aditya Prashant Naidu, Hem Gosalia, Ishaan Gakhar, Shaurya Singh Rathore, Krish Didwania, Ujjwal Verma

TL;DR

This work tackles the problem of small-object wildlife detection in UAV imagery by enhancing a YOLOv8-based detector with center-focused and smoothness losses (Wise IoU and Normalized Wasserstein Distance), efficient feature extraction via Linear Deformable Convolutions, and a multi-scale fusion module (Scaled Sequence Feature Fusion). A two-stage confidence-guided ROI refinement further improves localization for low-confidence detections. Empirical results on BuckTales and WAID show substantial parameter reductions (up to ~87% fewer parameters in some cases) while maintaining or surpassing state-of-the-art accuracy, demonstrating strong potential for real-world wildlife monitoring and conservation tasks.

Abstract

Although advances in deep learning and aerial surveillance technology are improving wildlife conservation efforts, complex and erratic environmental conditions still pose a problem, requiring innovative solutions for cost-effective small animal detection. This work introduces DEAL-YOLO, a novel approach that improves small object detection in Unmanned Aerial Vehicle (UAV) images by using multi-objective loss functions like Wise IoU (WIoU) and Normalized Wasserstein Distance (NWD), which prioritize pixels near the centre of the bounding box, ensuring smoother localization and reducing abrupt deviations. Additionally, the model is optimized through efficient feature extraction with Linear Deformable (LD) convolutions, enhancing accuracy while maintaining computational efficiency. The Scaled Sequence Feature Fusion (SSFF) module enhances object detection by effectively capturing inter-scale relationships, improving feature representation, and boosting metrics through optimized multiscale fusion. Comparison with baseline models reveals high efficacy with up to 69.5\% fewer parameters compared to vanilla Yolov8-N, highlighting the robustness of the proposed modifications. Through this approach, our paper aims to facilitate the detection of endangered species, animal population analysis, habitat monitoring, biodiversity research, and various other applications that enrich wildlife conservation efforts. DEAL-YOLO employs a two-stage inference paradigm for object detection, refining selected regions to improve localization and confidence. This approach enhances performance, especially for small instances with low objectness scores.

DEAL-YOLO: Drone-based Efficient Animal Localization using YOLO

TL;DR

This work tackles the problem of small-object wildlife detection in UAV imagery by enhancing a YOLOv8-based detector with center-focused and smoothness losses (Wise IoU and Normalized Wasserstein Distance), efficient feature extraction via Linear Deformable Convolutions, and a multi-scale fusion module (Scaled Sequence Feature Fusion). A two-stage confidence-guided ROI refinement further improves localization for low-confidence detections. Empirical results on BuckTales and WAID show substantial parameter reductions (up to ~87% fewer parameters in some cases) while maintaining or surpassing state-of-the-art accuracy, demonstrating strong potential for real-world wildlife monitoring and conservation tasks.

Abstract

Although advances in deep learning and aerial surveillance technology are improving wildlife conservation efforts, complex and erratic environmental conditions still pose a problem, requiring innovative solutions for cost-effective small animal detection. This work introduces DEAL-YOLO, a novel approach that improves small object detection in Unmanned Aerial Vehicle (UAV) images by using multi-objective loss functions like Wise IoU (WIoU) and Normalized Wasserstein Distance (NWD), which prioritize pixels near the centre of the bounding box, ensuring smoother localization and reducing abrupt deviations. Additionally, the model is optimized through efficient feature extraction with Linear Deformable (LD) convolutions, enhancing accuracy while maintaining computational efficiency. The Scaled Sequence Feature Fusion (SSFF) module enhances object detection by effectively capturing inter-scale relationships, improving feature representation, and boosting metrics through optimized multiscale fusion. Comparison with baseline models reveals high efficacy with up to 69.5\% fewer parameters compared to vanilla Yolov8-N, highlighting the robustness of the proposed modifications. Through this approach, our paper aims to facilitate the detection of endangered species, animal population analysis, habitat monitoring, biodiversity research, and various other applications that enrich wildlife conservation efforts. DEAL-YOLO employs a two-stage inference paradigm for object detection, refining selected regions to improve localization and confidence. This approach enhances performance, especially for small instances with low objectness scores.

Paper Structure

This paper contains 8 sections, 5 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Schematic overview of the proposed model. Our contributions to the YOLOv8 model are highlighted in Cyan. F1, F2, and F3 represent the feature maps with their corresponding dimensions. All other blocks are taken directly from YOLOv8 yolov8_ultralytics.
  • Figure 2: Schematic overview of the structure of LDConv.ZHANG2024105190 The initial sampled coordinates are assigned to a convolution of arbitrary size, and the sample shape is adjusted using learnable offsets. This process modifies the original sampled shape at each position through resampling.
  • Figure 3: Qualitative results on the WAID and BuckTales datasets. Ground truth annotations are shown in blue, single-stage inference predictions in red, and two-stage inference predictions in green. The left column represents the Ground Truth bounding boxes, the middle column represents DEAL-YOLO with standard inference and the right column represents results of two-stage inference.
  • Figure 4: Comparing the ROI of predicted anchor boxes from a single inference (shown in red) versus a two-step inference (shown in green), highlighting the removal of overlapping boxes and the increase in object confidence scores.