Table of Contents
Fetching ...

Real-Time Threaded Houbara Detection and Segmentation for Wildlife Conservation using Mobile Platforms

Lyes Saad Saoud, Loic Lesobre, Enrico Sorato, Irfan Hussain

TL;DR

This study presents a mobile-first, real-time framework for detecting and segmenting cryptic wildlife, specifically the Houbara bustard, by parallelizing YOLOv10-based detection and MobileSAM-based segmentation via a Threading Detection Model (TDM). The approach achieves high detection accuracy ($mAP_{50}=0.9627$, $mAP_{75}=0.7731$, $mAP_{95}=0.7178$) and strong segmentation performance ($mIoU=0.7421$, $mPLA=0.9773$) with real-time latency ($ ext{time} ightarrow 43.7$ ms per frame for detection and $0.1075$ s for segmentation) on edge hardware. A 40,000-image Houbara dataset supports robust training and evaluation, with careful data-splitting to avoid leakage and diverse environmental conditions. The work demonstrates significant practical potential for conservation, enabling non-invasive, real-time monitoring on resource-constrained devices, and provides publicly available code and demos to foster broader adoption and extension to additional species.

Abstract

Real-time animal detection and segmentation in natural environments are vital for wildlife conservation, enabling non-invasive monitoring through remote camera streams. However, these tasks remain challenging due to limited computational resources and the cryptic appearance of many species. We propose a mobile-optimized two-stage deep learning framework that integrates a Threading Detection Model (TDM) to parallelize YOLOv10-based detection and MobileSAM-based segmentation. Unlike prior YOLO+SAM pipelines, our approach improves real-time performance by reducing latency through threading. YOLOv10 handles detection while MobileSAM performs lightweight segmentation, both executed concurrently for efficient resource use. On the cryptic Houbara Bustard, a conservation-priority species, our model achieves mAP50 of 0.9627, mAP75 of 0.7731, mAP95 of 0.7178, and a MobileSAM mIoU of 0.7421. YOLOv10 operates at 43.7 ms per frame, confirming real-time readiness. We introduce a curated Houbara dataset of 40,000 annotated images to support model training and evaluation across diverse conditions. The code and dataset used in this study are publicly available on GitHub at https://github.com/LyesSaadSaoud/mobile-houbara-detseg. For interactive demos and additional resources, visit https://lyessaadsaoud.github.io/LyesSaadSaoud-Threaded-YOLO-SAM-Houbara.

Real-Time Threaded Houbara Detection and Segmentation for Wildlife Conservation using Mobile Platforms

TL;DR

This study presents a mobile-first, real-time framework for detecting and segmenting cryptic wildlife, specifically the Houbara bustard, by parallelizing YOLOv10-based detection and MobileSAM-based segmentation via a Threading Detection Model (TDM). The approach achieves high detection accuracy (, , ) and strong segmentation performance (, ) with real-time latency ( ms per frame for detection and s for segmentation) on edge hardware. A 40,000-image Houbara dataset supports robust training and evaluation, with careful data-splitting to avoid leakage and diverse environmental conditions. The work demonstrates significant practical potential for conservation, enabling non-invasive, real-time monitoring on resource-constrained devices, and provides publicly available code and demos to foster broader adoption and extension to additional species.

Abstract

Real-time animal detection and segmentation in natural environments are vital for wildlife conservation, enabling non-invasive monitoring through remote camera streams. However, these tasks remain challenging due to limited computational resources and the cryptic appearance of many species. We propose a mobile-optimized two-stage deep learning framework that integrates a Threading Detection Model (TDM) to parallelize YOLOv10-based detection and MobileSAM-based segmentation. Unlike prior YOLO+SAM pipelines, our approach improves real-time performance by reducing latency through threading. YOLOv10 handles detection while MobileSAM performs lightweight segmentation, both executed concurrently for efficient resource use. On the cryptic Houbara Bustard, a conservation-priority species, our model achieves mAP50 of 0.9627, mAP75 of 0.7731, mAP95 of 0.7178, and a MobileSAM mIoU of 0.7421. YOLOv10 operates at 43.7 ms per frame, confirming real-time readiness. We introduce a curated Houbara dataset of 40,000 annotated images to support model training and evaluation across diverse conditions. The code and dataset used in this study are publicly available on GitHub at https://github.com/LyesSaadSaoud/mobile-houbara-detseg. For interactive demos and additional resources, visit https://lyessaadsaoud.github.io/LyesSaadSaoud-Threaded-YOLO-SAM-Houbara.

Paper Structure

This paper contains 27 sections, 1 equation, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Framework of the Proposed Two-Stage Deep Learning Approach for Houbara Detection and Segmentation, incorporating the Threading Detection Model (TDM). The figure illustrates the end-to-end process starting from input images, through YOLOv10 for improved Houbara detection, followed by MobileSAM for specialized segmentation. The TDM manages parallel processing, task distribution, and synchronization to enhance real-time performance and efficiency throughout the system.
  • Figure 2: Annotation Workflow: Bounding boxes generated with GroundingDINO (Stage 1) followed by detailed segmentation using SAM2 (Stage 2). Manual validation ensures annotation accuracy.
  • Figure 3: Dataset analysis: (a) Number of instances per image, (b) Image resolution distribution, and (c) Spatial heatmap of Houbara positions. These insights ensure dataset balance and diversity.
  • Figure 4: Proposed MobileSAM for Houbara Bustard segmentation. YOLOv10 detects objects and generates bounding boxes, which are then passed to MobileSAM for precise segmentation.
  • Figure 5: Threaded architecture for real-time object detection and segmentation. The video feed is split into two threads: one for YOLOv10 object detection and the other for MobileSAM segmentation. Results are merged in post-processing for display or storage.
  • ...and 5 more figures