Table of Contents
Fetching ...

8-Calves Image dataset

Xuyang Fang, Sion Hannuna, Neill Campbell, Edwin Simpson

TL;DR

The paper tackles the need for realistic, temporally rich benchmarks for multi‑animal detection, tracking, and identification in agricultural settings with heavy occlusions. It introduces the 8-Calves dataset, comprising nine one-hour videos of eight calves with over 537k bounding boxes and temporal identities, produced via a semi‑automated YOLOv8m + ByteTrack labeling pipeline plus manual validation. Extensive benchmarks across 28 detectors, 23 identification models, and two trackers reveal that while detection can reach near‑ceiling $\mathrm{mAP}_{50}$ under lenient IoU, fine‑grained localization and identity preservation under occlusion remain challenging; small models (e.g., ConvNextV2 Nano) offer strong retrieval performance, BEiT pretraining enhances transfer, and tracking yields high $\mathrm{MOTA}$ but low $\mathrm{IDF1}$ due to occlusions. The dataset and benchmarks aim to drive progress in precision livestock farming with practical applications in feeding optimization, welfare assessment, and early disease detection, and open directions toward self‑supervised and multimodal approaches for scalable animal re-identification.

Abstract

Automated livestock monitoring is crucial for precision farming, but robust computer vision models are hindered by a lack of datasets reflecting real-world group challenges. We introduce the 8-Calves dataset, a challenging benchmark for multi-animal detection, tracking, and identification. It features a one-hour video of eight Holstein Friesian calves in a barn, with frequent occlusions, motion blur, and diverse poses. A semi-automated pipeline using a fine-tuned YOLOv8 detector and ByteTrack, followed by manual correction, provides over 537,000 bounding boxes with temporal identity labels. We benchmark 28 object detectors, showing near-perfect performance on a lenient IoU threshold (mAP50: 95.2-98.9%) but significant divergence on stricter metrics (mAP50:95: 56.5-66.4%), highlighting fine-grained localization challenges. Our identification benchmark across 23 models reveals a trade-off: scaling model size improves classification accuracy but compromises retrieval. Smaller architectures like ConvNextV2 Nano achieve the best balance (73.35% accuracy, 50.82% Top-1 KNN). Pre-training focused on semantic learning (e.g., BEiT) yielded superior transferability. For tracking, leading methods achieve high detection accuracy (MOTA > 0.92) but struggle with identity preservation (IDF1 $\approx$ 0.27), underscoring a key challenge in occlusion-heavy scenarios. The 8-Calves dataset bridges a gap by providing temporal richness and realistic challenges, serving as a resource for advancing agricultural vision models. The dataset and code are available at https://huggingface.co/datasets/tonyFang04/8-calves.

8-Calves Image dataset

TL;DR

The paper tackles the need for realistic, temporally rich benchmarks for multi‑animal detection, tracking, and identification in agricultural settings with heavy occlusions. It introduces the 8-Calves dataset, comprising nine one-hour videos of eight calves with over 537k bounding boxes and temporal identities, produced via a semi‑automated YOLOv8m + ByteTrack labeling pipeline plus manual validation. Extensive benchmarks across 28 detectors, 23 identification models, and two trackers reveal that while detection can reach near‑ceiling under lenient IoU, fine‑grained localization and identity preservation under occlusion remain challenging; small models (e.g., ConvNextV2 Nano) offer strong retrieval performance, BEiT pretraining enhances transfer, and tracking yields high but low due to occlusions. The dataset and benchmarks aim to drive progress in precision livestock farming with practical applications in feeding optimization, welfare assessment, and early disease detection, and open directions toward self‑supervised and multimodal approaches for scalable animal re-identification.

Abstract

Automated livestock monitoring is crucial for precision farming, but robust computer vision models are hindered by a lack of datasets reflecting real-world group challenges. We introduce the 8-Calves dataset, a challenging benchmark for multi-animal detection, tracking, and identification. It features a one-hour video of eight Holstein Friesian calves in a barn, with frequent occlusions, motion blur, and diverse poses. A semi-automated pipeline using a fine-tuned YOLOv8 detector and ByteTrack, followed by manual correction, provides over 537,000 bounding boxes with temporal identity labels. We benchmark 28 object detectors, showing near-perfect performance on a lenient IoU threshold (mAP50: 95.2-98.9%) but significant divergence on stricter metrics (mAP50:95: 56.5-66.4%), highlighting fine-grained localization challenges. Our identification benchmark across 23 models reveals a trade-off: scaling model size improves classification accuracy but compromises retrieval. Smaller architectures like ConvNextV2 Nano achieve the best balance (73.35% accuracy, 50.82% Top-1 KNN). Pre-training focused on semantic learning (e.g., BEiT) yielded superior transferability. For tracking, leading methods achieve high detection accuracy (MOTA > 0.92) but struggle with identity preservation (IDF1 0.27), underscoring a key challenge in occlusion-heavy scenarios. The 8-Calves dataset bridges a gap by providing temporal richness and realistic challenges, serving as a resource for advancing agricultural vision models. The dataset and code are available at https://huggingface.co/datasets/tonyFang04/8-calves.

Paper Structure

This paper contains 14 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Representative frames from the 8-Calves dataset video: (top-left) fully labeled frame; (top-right) frame with motion blur and partial occlusion; (bottom-left) severe occlusion with indistinct calf boundaries; (bottom-right) near-total occlusion as one calf is obscured behind another.
  • Figure 2: Benchmarking results of YOLO series on the 8-Calves dataset.
  • Figure 3: Linear classifier accuracy for calf identification across different vision backbones on the 8-Calves dataset.
  • Figure 4: K-Nearest Neighbors (KNN) Top-1 retrieval accuracy for calf identification on the 8-Calves dataset.
  • Figure 5: K-Nearest Neighbors (KNN) Top-5 retrieval accuracy for calf identification on the 8-Calves dataset.