Table of Contents
Fetching ...

AgrI Challenge: A Data-Centric AI Competition for Cross-Team Validation in Agricultural Vision

Mohammed Brahimi, Karim Laabassi, Mohamed Seghir Hadj Ameur, Aicha Boutorh, Badia Siab-Farsi, Amin Khouani, Omar Farouk Zouak, Seif Eddine Bouziane, Kheira Lakhdari, Abdelkader Nabil Benghanem

TL;DR

The AgrI Challenge is introduced, a data-centric competition framework in which multiple teams independently collect field datasets, producing a heterogeneous multi-source benchmark that reflects realistic variability in acquisition conditions and proposes Cross-Team Validation (CTV), an evaluation paradigm that treats each team's dataset as a distinct domain.

Abstract

Machine learning models in agricultural vision often achieve high accuracy on curated datasets but fail to generalize under real field conditions due to distribution shifts between training and deployment environments. Moreover, most machine learning competitions focus primarily on model design while treating datasets as fixed resources, leaving the role of data collection practices in model generalization largely unexplored. We introduce the AgrI Challenge, a data-centric competition framework in which multiple teams independently collect field datasets, producing a heterogeneous multi-source benchmark that reflects realistic variability in acquisition conditions. To systematically evaluate cross-domain generalization across independently collected datasets, we propose Cross-Team Validation (CTV), an evaluation paradigm that treats each team's dataset as a distinct domain. CTV includes two complementary protocols: Train-on-One-Team-Only (TOTO), which measures single-source generalization, and Leave-One-Team-Out (LOTO), which evaluates collaborative multi-source training. Experiments reveal substantial generalization gaps under single-source training: models achieve near-perfect validation accuracy yet exhibit validation-test gaps of up to 16.20% (DenseNet121) and 11.37% (Swin Transformer) when evaluated on datasets collected by other teams. In contrast, collaborative multi-source training dramatically improves robustness, reducing the gap to 2.82% and 1.78%, respectively. The challenge also produced a publicly available dataset of 50,673 field images of six tree species collected by twelve independent teams, providing a diverse benchmark for studying domain shift and data-centric learning in agricultural vision.

AgrI Challenge: A Data-Centric AI Competition for Cross-Team Validation in Agricultural Vision

TL;DR

The AgrI Challenge is introduced, a data-centric competition framework in which multiple teams independently collect field datasets, producing a heterogeneous multi-source benchmark that reflects realistic variability in acquisition conditions and proposes Cross-Team Validation (CTV), an evaluation paradigm that treats each team's dataset as a distinct domain.

Abstract

Machine learning models in agricultural vision often achieve high accuracy on curated datasets but fail to generalize under real field conditions due to distribution shifts between training and deployment environments. Moreover, most machine learning competitions focus primarily on model design while treating datasets as fixed resources, leaving the role of data collection practices in model generalization largely unexplored. We introduce the AgrI Challenge, a data-centric competition framework in which multiple teams independently collect field datasets, producing a heterogeneous multi-source benchmark that reflects realistic variability in acquisition conditions. To systematically evaluate cross-domain generalization across independently collected datasets, we propose Cross-Team Validation (CTV), an evaluation paradigm that treats each team's dataset as a distinct domain. CTV includes two complementary protocols: Train-on-One-Team-Only (TOTO), which measures single-source generalization, and Leave-One-Team-Out (LOTO), which evaluates collaborative multi-source training. Experiments reveal substantial generalization gaps under single-source training: models achieve near-perfect validation accuracy yet exhibit validation-test gaps of up to 16.20% (DenseNet121) and 11.37% (Swin Transformer) when evaluated on datasets collected by other teams. In contrast, collaborative multi-source training dramatically improves robustness, reducing the gap to 2.82% and 1.78%, respectively. The challenge also produced a publicly available dataset of 50,673 field images of six tree species collected by twelve independent teams, providing a diverse benchmark for studying domain shift and data-centric learning in agricultural vision.
Paper Structure (37 sections, 2 equations, 12 figures, 7 tables)

This paper contains 37 sections, 2 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: AgrI Challenge workflow showing the two-phase competition design and Cross-Team Validation evaluation protocols.
  • Figure 2: Histogram showing the distribution of dataset images by capture device model. The dataset contains images from more than 40 device types, with the largest groups originating from iPhone 11 and Oppo Reno5, and a substantial portion labeled as Unknown due to missing metadata.
  • Figure 3: Cross-team test accuracy for DenseNet121 and Swin Transformer in TOTO protocol. Horizontal dashed lines indicate means: 81.19% (DenseNet) and 87.21% (Swin).
  • Figure 4: Global model performance under the TOTO protocol for DenseNet121 and Swin Transformer (mean $\pm$ std across 12 runs). The large validation--test gap highlights the cross-team generalization challenge under single-team training.
  • Figure 5: Aggregate learning curves for TOTO protocol ($n=12$ runs, mean $\pm$ std). Left: Swin Transformer. Right: DenseNet121. The gap between validation and test accuracy persists across all epochs.
  • ...and 7 more figures