8-Calves Image dataset

Xuyang Fang; Sion Hannuna; Neill Campbell; Edwin Simpson

8-Calves Image dataset

Xuyang Fang, Sion Hannuna, Neill Campbell, Edwin Simpson

TL;DR

The paper tackles the need for realistic, temporally rich benchmarks for multi‑animal detection, tracking, and identification in agricultural settings with heavy occlusions. It introduces the 8-Calves dataset, comprising nine one-hour videos of eight calves with over 537k bounding boxes and temporal identities, produced via a semi‑automated YOLOv8m + ByteTrack labeling pipeline plus manual validation. Extensive benchmarks across 28 detectors, 23 identification models, and two trackers reveal that while detection can reach near‑ceiling $\mathrm{mAP}_{50}$ under lenient IoU, fine‑grained localization and identity preservation under occlusion remain challenging; small models (e.g., ConvNextV2 Nano) offer strong retrieval performance, BEiT pretraining enhances transfer, and tracking yields high $\mathrm{MOTA}$ but low $\mathrm{IDF1}$ due to occlusions. The dataset and benchmarks aim to drive progress in precision livestock farming with practical applications in feeding optimization, welfare assessment, and early disease detection, and open directions toward self‑supervised and multimodal approaches for scalable animal re-identification.

Abstract

Automated livestock monitoring is crucial for precision farming, but robust computer vision models are hindered by a lack of datasets reflecting real-world group challenges. We introduce the 8-Calves dataset, a challenging benchmark for multi-animal detection, tracking, and identification. It features a one-hour video of eight Holstein Friesian calves in a barn, with frequent occlusions, motion blur, and diverse poses. A semi-automated pipeline using a fine-tuned YOLOv8 detector and ByteTrack, followed by manual correction, provides over 537,000 bounding boxes with temporal identity labels. We benchmark 28 object detectors, showing near-perfect performance on a lenient IoU threshold (mAP50: 95.2-98.9%) but significant divergence on stricter metrics (mAP50:95: 56.5-66.4%), highlighting fine-grained localization challenges. Our identification benchmark across 23 models reveals a trade-off: scaling model size improves classification accuracy but compromises retrieval. Smaller architectures like ConvNextV2 Nano achieve the best balance (73.35% accuracy, 50.82% Top-1 KNN). Pre-training focused on semantic learning (e.g., BEiT) yielded superior transferability. For tracking, leading methods achieve high detection accuracy (MOTA > 0.92) but struggle with identity preservation (IDF1 $\approx$ 0.27), underscoring a key challenge in occlusion-heavy scenarios. The 8-Calves dataset bridges a gap by providing temporal richness and realistic challenges, serving as a resource for advancing agricultural vision models. The dataset and code are available at https://huggingface.co/datasets/tonyFang04/8-calves.

8-Calves Image dataset

TL;DR

Abstract

8-Calves Image dataset

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)