Table of Contents
Fetching ...

A Model Generalization Study in Localizing Indoor Cows with COw LOcalization (COLO) dataset

Mautushi Das, Gonzalo Ferreira, C. P. James Chen

TL;DR

The paper addresses robust localization of indoor cows using YOLO-based detectors and introduces the COLO dataset (1254 images, 11818 cow instances) to study cross-environment generalization. It systematically evaluates how lighting and camera viewpoint affect detection, and compares model complexity and fine-tuning strategies across multiple cross-validation configurations. Key findings show that view-angle changes, especially to side viewpoints, substantially reduce performance, while lighting variations have a smaller impact; increasing model size does not always improve generalization, and fine-tuning with task-relevant weights benefits larger models more than simple ones. The work provides practical guidelines for PLF researchers, highlighting when simple pre-trained deployments suffice and when fine-tuning larger models is advantageous, and it offers the public COLO dataset to foster further research in indoor livestock localization.

Abstract

Precision livestock farming (PLF) increasingly relies on advanced object localization techniques to monitor livestock health and optimize resource management. This study investigates the generalization capabilities of YOLOv8 and YOLOv9 models for cow detection in indoor free-stall barn settings, focusing on varying training data characteristics such as view angles and lighting, and model complexities. Leveraging the newly released public dataset, COws LOcalization (COLO) dataset, we explore three key hypotheses: (1) Model generalization is equally influenced by changes in lighting conditions and camera angles; (2) Higher model complexity guarantees better generalization performance; (3) Fine-tuning with custom initial weights trained on relevant tasks always brings advantages to detection tasks. Our findings reveal considerable challenges in detecting cows in images taken from side views and underscore the importance of including diverse camera angles in building a detection model. Furthermore, our results emphasize that higher model complexity does not necessarily lead to better performance. The optimal model configuration heavily depends on the specific task and dataset. Lastly, while fine-tuning with custom initial weights trained on relevant tasks offers advantages to detection tasks, simpler models do not benefit similarly from this approach. It is more efficient to train a simple model with pre-trained weights without relying on prior relevant information, which can require intensive labor efforts. Future work should focus on adaptive methods and advanced data augmentation to improve generalization and robustness. This study provides practical guidelines for PLF researchers on deploying computer vision models from existing studies, highlights generalization issues, and contributes the COLO dataset containing 1254 images and 11818 cow instances for further research.

A Model Generalization Study in Localizing Indoor Cows with COw LOcalization (COLO) dataset

TL;DR

The paper addresses robust localization of indoor cows using YOLO-based detectors and introduces the COLO dataset (1254 images, 11818 cow instances) to study cross-environment generalization. It systematically evaluates how lighting and camera viewpoint affect detection, and compares model complexity and fine-tuning strategies across multiple cross-validation configurations. Key findings show that view-angle changes, especially to side viewpoints, substantially reduce performance, while lighting variations have a smaller impact; increasing model size does not always improve generalization, and fine-tuning with task-relevant weights benefits larger models more than simple ones. The work provides practical guidelines for PLF researchers, highlighting when simple pre-trained deployments suffice and when fine-tuning larger models is advantageous, and it offers the public COLO dataset to foster further research in indoor livestock localization.

Abstract

Precision livestock farming (PLF) increasingly relies on advanced object localization techniques to monitor livestock health and optimize resource management. This study investigates the generalization capabilities of YOLOv8 and YOLOv9 models for cow detection in indoor free-stall barn settings, focusing on varying training data characteristics such as view angles and lighting, and model complexities. Leveraging the newly released public dataset, COws LOcalization (COLO) dataset, we explore three key hypotheses: (1) Model generalization is equally influenced by changes in lighting conditions and camera angles; (2) Higher model complexity guarantees better generalization performance; (3) Fine-tuning with custom initial weights trained on relevant tasks always brings advantages to detection tasks. Our findings reveal considerable challenges in detecting cows in images taken from side views and underscore the importance of including diverse camera angles in building a detection model. Furthermore, our results emphasize that higher model complexity does not necessarily lead to better performance. The optimal model configuration heavily depends on the specific task and dataset. Lastly, while fine-tuning with custom initial weights trained on relevant tasks offers advantages to detection tasks, simpler models do not benefit similarly from this approach. It is more efficient to train a simple model with pre-trained weights without relying on prior relevant information, which can require intensive labor efforts. Future work should focus on adaptive methods and advanced data augmentation to improve generalization and robustness. This study provides practical guidelines for PLF researchers on deploying computer vision models from existing studies, highlights generalization issues, and contributes the COLO dataset containing 1254 images and 11818 cow instances for further research.
Paper Structure (4 sections, 7 figures, 3 tables)

This paper contains 4 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Overview of the COLO dataset. 1a. Seven instance images from the dataset with red bounding boxes labeling the location of cows. The columns show three different view angles: top-view, side-view, and external. The rows show three different lighting conditions: daylight, indoor, and near-infrared. 1b. An example of the annotated image in YOLO format. W, H, $w_b$, and $h_b$ represent the width, height, width of the bounding box, and height of the bounding box, respectively. x and y represent the center coordinates of the bounding box.
  • Figure 2: Cross-validation configurations. The training and testing sets were split into five different configurations: Baseline, Top2Side, Side2Top, Day2Night, and External.
  • Figure 3: Pairwise scatter plots of the evaluation metrics: $\text{mAP@{0.5:0.95}}$, $\text{mAP@{0.5}}$, precision, and recall. Each point represents a different model configuration, with the color indicating the training sample size.
  • Figure 4: The generalization performance of YOLOv9e across various data configurations and training sample sizes. Sample sizes are depicted on the horizontal axis using a logarithmic scale with a base of 2, and the data configurations are represented by different colors and marker shapes. The upper left and right plots display the metrics $\text{mAP@{0.5:0.95}}$ and $\text{mAP@{0.5}}$, respectively, for different training samples across diverse data configurations. The lower left and right plots depict precision and recall values, also for varying training samples and configurations.
  • Figure 5: The performance of YOLOv8 and YOLOv9 models across various model parameters and data configurations, evaluated using four metrics: $\text{mAP@{0.5:0.95}}$, $\text{mAP@{0.5}}$, precision, and recall. Each column indicates a different data configuration, starting from top left to bottom right: 'Baseline', 'Day2Night', 'Side2Top', 'Top2Side', and 'External'. The horizontal axis of all plots indicates the number of model parameters.
  • ...and 2 more figures