Table of Contents
Fetching ...

Domain Adaptable Fine-Tune Distillation Framework For Advancing Farm Surveillance

Raza Imam, Muhammad Huzaifa, Nabil Mansour, Shaher Bano Mirza, Fouad Lamghari

TL;DR

This work tackles real-time, domain-adaptable camel farm surveillance by coupling a Unified Auto-Annotation framework (GroundingDINO combined with SAM) with a Fine-Tune Distillation pipeline that transfers knowledge from a large teacher to a lightweight student (e.g., YOLOv8). The approach enables automatic labeling of surveillance frames and distills powerful generalization into an edge-deployable detector, validated on data from Al-Marmoom Camel Farm. Among tested configurations, YOLOv8s trained for 50 epochs at 1024-pixel resolution achieved the best balance of accuracy (AP ≈ 80.3%), speed, and resource use, making it suitable for real-time monitoring. The framework reduces labeling effort, offers transparency in training, and supports domain adaptation to other farms or livestock tasks, with open-source code available for reproduction and extension.

Abstract

In this study, we propose an automated framework for camel farm monitoring, introducing two key contributions: the Unified Auto-Annotation framework and the Fine-Tune Distillation framework. The Unified Auto-Annotation approach combines two models, GroundingDINO (GD), and Segment-Anything-Model (SAM), to automatically annotate raw datasets extracted from surveillance videos. Building upon this foundation, the Fine-Tune Distillation framework conducts fine-tuning of student models using the auto-annotated dataset. This process involves transferring knowledge from a large teacher model to a student model, resembling a variant of Knowledge Distillation. The Fine-Tune Distillation framework aims to be adaptable to specific use cases, enabling the transfer of knowledge from the large models to the small models, making it suitable for domain-specific applications. By leveraging our raw dataset collected from Al-Marmoom Camel Farm in Dubai, UAE, and a pre-trained teacher model, GroundingDINO, the Fine-Tune Distillation framework produces a lightweight deployable model, YOLOv8. This framework demonstrates high performance and computational efficiency, facilitating efficient real-time object detection. Our code is available at \href{https://github.com/Razaimam45/Fine-Tune-Distillation}{https://github.com/Razaimam45/Fine-Tune-Distillation}

Domain Adaptable Fine-Tune Distillation Framework For Advancing Farm Surveillance

TL;DR

This work tackles real-time, domain-adaptable camel farm surveillance by coupling a Unified Auto-Annotation framework (GroundingDINO combined with SAM) with a Fine-Tune Distillation pipeline that transfers knowledge from a large teacher to a lightweight student (e.g., YOLOv8). The approach enables automatic labeling of surveillance frames and distills powerful generalization into an edge-deployable detector, validated on data from Al-Marmoom Camel Farm. Among tested configurations, YOLOv8s trained for 50 epochs at 1024-pixel resolution achieved the best balance of accuracy (AP ≈ 80.3%), speed, and resource use, making it suitable for real-time monitoring. The framework reduces labeling effort, offers transparency in training, and supports domain adaptation to other farms or livestock tasks, with open-source code available for reproduction and extension.

Abstract

In this study, we propose an automated framework for camel farm monitoring, introducing two key contributions: the Unified Auto-Annotation framework and the Fine-Tune Distillation framework. The Unified Auto-Annotation approach combines two models, GroundingDINO (GD), and Segment-Anything-Model (SAM), to automatically annotate raw datasets extracted from surveillance videos. Building upon this foundation, the Fine-Tune Distillation framework conducts fine-tuning of student models using the auto-annotated dataset. This process involves transferring knowledge from a large teacher model to a student model, resembling a variant of Knowledge Distillation. The Fine-Tune Distillation framework aims to be adaptable to specific use cases, enabling the transfer of knowledge from the large models to the small models, making it suitable for domain-specific applications. By leveraging our raw dataset collected from Al-Marmoom Camel Farm in Dubai, UAE, and a pre-trained teacher model, GroundingDINO, the Fine-Tune Distillation framework produces a lightweight deployable model, YOLOv8. This framework demonstrates high performance and computational efficiency, facilitating efficient real-time object detection. Our code is available at \href{https://github.com/Razaimam45/Fine-Tune-Distillation}{https://github.com/Razaimam45/Fine-Tune-Distillation}
Paper Structure (14 sections, 5 equations, 12 figures, 6 tables)

This paper contains 14 sections, 5 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: Overview of the Knowledge Transfer in Fine-Tune Distillation
  • Figure 2: An overview of the comprehensive research design implemented in this work
  • Figure 3: Sample examples of our dataset following the the preprocessed phase
  • Figure 4: Data distribution before and after augmentation stage
  • Figure 5: Zero-Shot Inference on our dataset images utilizing GroundingDINO with the class prompts "camel", "rope", "mask", and "pole" as the four classes of interest in the context of the taming process. (red BB (Bounding Box) denotes class "camel", green denotes "rope", yellow denotes "mask", and blue BB denotes "pole")
  • ...and 7 more figures