Table of Contents
Fetching ...

Robust MultiSpecies Agricultural Segmentation Across Devices, Seasons, and Sensors Using Hierarchical DINOv2 Models

Artzai Picon, Itziar Eguskiza, Daniel Mugica, Javier Romero, Carlos Javier Jimenez, Eric White, Gabriel Do-Lago-Junqueira, Christian Klukas, Ramon Navarra-Mestre

TL;DR

A segmentation framework that integrates vision foundation models (DINOv2) with hierarchical taxonomic inference with structured biological hierarchies for scalable, shift-resilient agricultural monitoring is evaluated, illustrating the practical viability of combining foundation models with structured biological hierarchies for scalable, shift-resilient agricultural monitoring.

Abstract

Reliable plant species and damage segmentation for herbicide field research trials requires models that can withstand substantial real-world variation across seasons, geographies, devices, and sensing modalities. Most deep learning approaches trained on controlled datasets fail to generalize under these domain shifts, limiting their suitability for operational phenotyping pipelines. This study evaluates a segmentation framework that integrates vision foundation models (DINOv2) with hierarchical taxonomic inference to improve robustness across heterogeneous agricultural conditions. We train on a large, multi-year dataset collected in Germany and Spain (2018-2020), comprising 14 plant species and 4 herbicide damage classes, and assess generalization under increasingly challenging shifts: temporal and device changes (2023), geographic transfer to the United States, and extreme sensor shift to drone imagery (2024). Results show that the foundation-model backbone consistently outperforms prior baselines, improving species-level F1 from 0.52 to 0.87 on in-distribution data and maintaining significant advantages under moderate (0.77 vs. 0.24) and extreme (0.44 vs. 0.14) shift conditions. Hierarchical inference provides an additional layer of robustness, enabling meaningful predictions even when fine-grained species classification degrades (family F1: 0.68, class F1: 0.88 on aerial imagery). Error analysis reveals that failures under severe shift stem primarily from vegetation-soil confusion, suggesting that taxonomic distinctions remain preserved despite background and viewpoint variability. The system is now deployed within BASF's phenotyping workflow for herbicide research trials across multiple regions, illustrating the practical viability of combining foundation models with structured biological hierarchies for scalable, shift-resilient agricultural monitoring.

Robust MultiSpecies Agricultural Segmentation Across Devices, Seasons, and Sensors Using Hierarchical DINOv2 Models

TL;DR

A segmentation framework that integrates vision foundation models (DINOv2) with hierarchical taxonomic inference with structured biological hierarchies for scalable, shift-resilient agricultural monitoring is evaluated, illustrating the practical viability of combining foundation models with structured biological hierarchies for scalable, shift-resilient agricultural monitoring.

Abstract

Reliable plant species and damage segmentation for herbicide field research trials requires models that can withstand substantial real-world variation across seasons, geographies, devices, and sensing modalities. Most deep learning approaches trained on controlled datasets fail to generalize under these domain shifts, limiting their suitability for operational phenotyping pipelines. This study evaluates a segmentation framework that integrates vision foundation models (DINOv2) with hierarchical taxonomic inference to improve robustness across heterogeneous agricultural conditions. We train on a large, multi-year dataset collected in Germany and Spain (2018-2020), comprising 14 plant species and 4 herbicide damage classes, and assess generalization under increasingly challenging shifts: temporal and device changes (2023), geographic transfer to the United States, and extreme sensor shift to drone imagery (2024). Results show that the foundation-model backbone consistently outperforms prior baselines, improving species-level F1 from 0.52 to 0.87 on in-distribution data and maintaining significant advantages under moderate (0.77 vs. 0.24) and extreme (0.44 vs. 0.14) shift conditions. Hierarchical inference provides an additional layer of robustness, enabling meaningful predictions even when fine-grained species classification degrades (family F1: 0.68, class F1: 0.88 on aerial imagery). Error analysis reveals that failures under severe shift stem primarily from vegetation-soil confusion, suggesting that taxonomic distinctions remain preserved despite background and viewpoint variability. The system is now deployed within BASF's phenotyping workflow for herbicide research trials across multiple regions, illustrating the practical viability of combining foundation models with structured biological hierarchies for scalable, shift-resilient agricultural monitoring.

Paper Structure

This paper contains 29 sections, 1 equation, 26 figures, 17 tables.

Figures (26)

  • Figure 1: 2019A2 dataset. Example image with species and damage masks. Left) Original Image, Middle) Species annotation, Right) Damage annotation.
  • Figure 2: Reality check dataset examples. Top) Germany, Middle) Spain, and Bottom) United States. Each figure shows an image with species and damage masks. Left) Original Image, Middle) Species annotation, Right) Damage annotation.
  • Figure 3: Drone-based reality dataset content summary
  • Figure 5: t-SNE visualization of all dataset collections, illustrating the distribution of embeddings extracted from a DINOv2 model from images across the different datasets. Each point represents a different image, and each color represents a different location. The datasets used for training (Base) and the REALITY dataset and DRONE dataset are shown with different geometrical shapes.
  • Figure 6: Proposed architecture. A common encoding backbone is connected to the three independent decoders. Each decoder handles one of the network tasks (vegetation, damage, and species). K1 represents the number of species classes, and K2 represents the number of damage classes.
  • ...and 21 more figures