Table of Contents
Fetching ...

Vision Foundation Models in Agriculture: Toward Domain-Specific Adaptation for Weed Herbicide Trials Assessment

Leire Benito-Del-Valle, Artzai Picón, Daniel Mugica, Manuel Ramos, Eva Portillo, Javier Romero, Carlos Javier Jimenez, Ramón Navarra-Mestre

TL;DR

This work develops a domain-specific vision foundation model for herbicide trials by applying self-supervised pretraining (DINOv2) on a large, curated agricultural dataset and then fine-tuning with SegFormer decoders for vegetation, species, and damage segmentation. It demonstrates clear performance gains over general-purpose foundation models, especially under domain shifts and with limited labeled data, achieving higher F1 scores for both species identification and damage classification on BASE, REALITY, and DRONE datasets. The results highlight improved generalization to unseen environments and reduced annotation needs, suggesting practical scalability for automated herbicide-trial analysis. The study also discusses computational considerations and future directions (e.g., LoRA) to further enhance efficiency and applicability across broader agricultural tasks.

Abstract

Herbicide field trials require accurate identification of plant species and assessment of herbicide-induced damage across diverse environments. While general-purpose vision foundation models have shown promising results in complex visual domains, their performance can be limited in agriculture, where fine-grained distinctions between species and damage types are critical. In this work, we adapt a general-purpose vision foundation model to herbicide trial characterization. Trained using a self-supervised learning approach on a large, curated agricultural dataset, the model learns rich and transferable representations optimized for herbicide trials images. Our domain-specific model significantly outperforms the best general-purpose foundation model in both species identification (F1 score improvement from 0.91 to 0.94) and damage classification (from 0.26 to 0.33). Under unseen conditions (new locations and other time), it achieves even greater gains (species identification from 0.56 to 0.66; damage classification from 0.17 to 0.27). In domain-shift scenarios, such as drone imagery, it maintains strong performance (species classification from 0.49 to 0.60). Additionally, we show that domain-specific pretraining enhances segmentation accuracy, particularly in low-annotation regimes. An annotation-efficiency analysis reveals that, under unseen conditions, the domain-specific model achieves 5.4% higher F1 score than the general-purpose model, while using 80% fewer labeled samples. These results demonstrate the generalization capabilities of domain-specific foundation models and their potential to significantly reduce manual annotation efforts, offering a scalable and automated solution for herbicide trial analysis.

Vision Foundation Models in Agriculture: Toward Domain-Specific Adaptation for Weed Herbicide Trials Assessment

TL;DR

This work develops a domain-specific vision foundation model for herbicide trials by applying self-supervised pretraining (DINOv2) on a large, curated agricultural dataset and then fine-tuning with SegFormer decoders for vegetation, species, and damage segmentation. It demonstrates clear performance gains over general-purpose foundation models, especially under domain shifts and with limited labeled data, achieving higher F1 scores for both species identification and damage classification on BASE, REALITY, and DRONE datasets. The results highlight improved generalization to unseen environments and reduced annotation needs, suggesting practical scalability for automated herbicide-trial analysis. The study also discusses computational considerations and future directions (e.g., LoRA) to further enhance efficiency and applicability across broader agricultural tasks.

Abstract

Herbicide field trials require accurate identification of plant species and assessment of herbicide-induced damage across diverse environments. While general-purpose vision foundation models have shown promising results in complex visual domains, their performance can be limited in agriculture, where fine-grained distinctions between species and damage types are critical. In this work, we adapt a general-purpose vision foundation model to herbicide trial characterization. Trained using a self-supervised learning approach on a large, curated agricultural dataset, the model learns rich and transferable representations optimized for herbicide trials images. Our domain-specific model significantly outperforms the best general-purpose foundation model in both species identification (F1 score improvement from 0.91 to 0.94) and damage classification (from 0.26 to 0.33). Under unseen conditions (new locations and other time), it achieves even greater gains (species identification from 0.56 to 0.66; damage classification from 0.17 to 0.27). In domain-shift scenarios, such as drone imagery, it maintains strong performance (species classification from 0.49 to 0.60). Additionally, we show that domain-specific pretraining enhances segmentation accuracy, particularly in low-annotation regimes. An annotation-efficiency analysis reveals that, under unseen conditions, the domain-specific model achieves 5.4% higher F1 score than the general-purpose model, while using 80% fewer labeled samples. These results demonstrate the generalization capabilities of domain-specific foundation models and their potential to significantly reduce manual annotation efforts, offering a scalable and automated solution for herbicide trial analysis.

Paper Structure

This paper contains 20 sections, 15 figures, 6 tables.

Figures (15)

  • Figure 1: Distribution of image tiles based on vegetation coverage.
  • Figure 2: Image samples from the curated herbicides trials dataset.
  • Figure 3: 2019A2 dataset. Example image with species, and damage masks. left) Original Image, middle) Species annotation, right) Damage annotation (picon_taxonomic_2025).
  • Figure 4: Primitive extraction example showing the original image (left) and the corresponding cropped regions (right) used as individual primitives.
  • Figure 5: Overview of the proposed methodology. Left) Self-supervised pretraining stage where DINOv2 is fine-tuned on the herbicides trials dataset (Section \ref{['subsec:herbicides_trials_dataset']}). Model performance is monitored and optimal weights are selected using the primitives dataset (Section \ref{['subsec:primitives_dataset']}). Right) Supervised fine-tuning stage where the segmentation model is trained on the BASE dataset (Section \ref{['subsec:base_data']}). Model evaluation is performed on the test subset of the BASE dataset (same domain) and two additional datasets to assess domain shift robustness: REALITY CHECK (Section \ref{['subsec:reality_check_dataset']}) and DRONE dataset (Section \ref{['subsec:drones_dataset']}).
  • ...and 10 more figures