Domain Adaptable Fine-Tune Distillation Framework For Advancing Farm Surveillance
Raza Imam, Muhammad Huzaifa, Nabil Mansour, Shaher Bano Mirza, Fouad Lamghari
TL;DR
This work tackles real-time, domain-adaptable camel farm surveillance by coupling a Unified Auto-Annotation framework (GroundingDINO combined with SAM) with a Fine-Tune Distillation pipeline that transfers knowledge from a large teacher to a lightweight student (e.g., YOLOv8). The approach enables automatic labeling of surveillance frames and distills powerful generalization into an edge-deployable detector, validated on data from Al-Marmoom Camel Farm. Among tested configurations, YOLOv8s trained for 50 epochs at 1024-pixel resolution achieved the best balance of accuracy (AP ≈ 80.3%), speed, and resource use, making it suitable for real-time monitoring. The framework reduces labeling effort, offers transparency in training, and supports domain adaptation to other farms or livestock tasks, with open-source code available for reproduction and extension.
Abstract
In this study, we propose an automated framework for camel farm monitoring, introducing two key contributions: the Unified Auto-Annotation framework and the Fine-Tune Distillation framework. The Unified Auto-Annotation approach combines two models, GroundingDINO (GD), and Segment-Anything-Model (SAM), to automatically annotate raw datasets extracted from surveillance videos. Building upon this foundation, the Fine-Tune Distillation framework conducts fine-tuning of student models using the auto-annotated dataset. This process involves transferring knowledge from a large teacher model to a student model, resembling a variant of Knowledge Distillation. The Fine-Tune Distillation framework aims to be adaptable to specific use cases, enabling the transfer of knowledge from the large models to the small models, making it suitable for domain-specific applications. By leveraging our raw dataset collected from Al-Marmoom Camel Farm in Dubai, UAE, and a pre-trained teacher model, GroundingDINO, the Fine-Tune Distillation framework produces a lightweight deployable model, YOLOv8. This framework demonstrates high performance and computational efficiency, facilitating efficient real-time object detection. Our code is available at \href{https://github.com/Razaimam45/Fine-Tune-Distillation}{https://github.com/Razaimam45/Fine-Tune-Distillation}
