Table of Contents
Fetching ...

AgriBench: A Hierarchical Agriculture Benchmark for Multimodal Large Language Models

Yutong Zhou, Masahiro Ryo

TL;DR

AgriBench addresses the lack of agriculture-specific evaluation for multimodal large language models by introducing a hierarchical, multi-task benchmark and MM-LUCAS, a multimodal EU land-use/land-cover dataset. The framework spans five task levels from basic recognition to human-aligned suggestions, leveraging RGB images, segmentation masks, depth maps, and LandQA prompts to comprehensively assess MM-LLMs. Five MM-LLMs (two open-source and three closed-source) are evaluated, revealing that while general agricultural knowledge is well captured, domain-specific inference such as disease diagnosis still benefits from specialized fine-tuning and richer multimodal cues. The work lays a foundation for robust, domain-aware MM-LLMs in agriculture, with MM-LUCAS enabling richer annotations and future expert-knowledge models for farming decision-support and policy planning.

Abstract

We introduce AgriBench, the first agriculture benchmark designed to evaluate MultiModal Large Language Models (MM-LLMs) for agriculture applications. To further address the agriculture knowledge-based dataset limitation problem, we propose MM-LUCAS, a multimodal agriculture dataset, that includes 1,784 landscape images, segmentation masks, depth maps, and detailed annotations (geographical location, country, date, land cover and land use taxonomic details, quality scores, aesthetic scores, etc), based on the Land Use/Cover Area Frame Survey (LUCAS) dataset, which contains comparable statistics on land use and land cover for the European Union (EU) territory. This work presents a groundbreaking perspective in advancing agriculture MM-LLMs and is still in progress, offering valuable insights for future developments and innovations in specific expert knowledge-based MM-LLMs.

AgriBench: A Hierarchical Agriculture Benchmark for Multimodal Large Language Models

TL;DR

AgriBench addresses the lack of agriculture-specific evaluation for multimodal large language models by introducing a hierarchical, multi-task benchmark and MM-LUCAS, a multimodal EU land-use/land-cover dataset. The framework spans five task levels from basic recognition to human-aligned suggestions, leveraging RGB images, segmentation masks, depth maps, and LandQA prompts to comprehensively assess MM-LLMs. Five MM-LLMs (two open-source and three closed-source) are evaluated, revealing that while general agricultural knowledge is well captured, domain-specific inference such as disease diagnosis still benefits from specialized fine-tuning and richer multimodal cues. The work lays a foundation for robust, domain-aware MM-LLMs in agriculture, with MM-LUCAS enabling richer annotations and future expert-knowledge models for farming decision-support and policy planning.

Abstract

We introduce AgriBench, the first agriculture benchmark designed to evaluate MultiModal Large Language Models (MM-LLMs) for agriculture applications. To further address the agriculture knowledge-based dataset limitation problem, we propose MM-LUCAS, a multimodal agriculture dataset, that includes 1,784 landscape images, segmentation masks, depth maps, and detailed annotations (geographical location, country, date, land cover and land use taxonomic details, quality scores, aesthetic scores, etc), based on the Land Use/Cover Area Frame Survey (LUCAS) dataset, which contains comparable statistics on land use and land cover for the European Union (EU) territory. This work presents a groundbreaking perspective in advancing agriculture MM-LLMs and is still in progress, offering valuable insights for future developments and innovations in specific expert knowledge-based MM-LLMs.

Paper Structure

This paper contains 27 sections, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Five levels of MM task difficulty in the agricultural domain.
  • Figure 2: Left: Overview of the MM-LUCAS dataset. Middle: The illustration of data collection and processing. Right: Data example in the MM-LUCAS dataset. The box colors in the Middle and Right correspond to the colors in the Left.
  • Figure 3: Distribution of the top 10 lc1_label and all lu1_label categories.
  • Figure 4: Average aesthetic score (AS) for different segmentation classes.
  • Figure 5: Spatial distribution of quality score and aesthetic score across EU.
  • ...and 2 more figures