Table of Contents
Fetching ...

CMAViT: Integrating Climate, Managment, and Remote Sensing Data for Crop Yield Estimation with Multimodel Vision Transformers

Hamid Kamangir, Brent. S. Sams, Nick Dokoozlian, Luis Sanchez, J. Mason. Earles

TL;DR

A deep learning-based multi-model designed for pixel-level vineyard yield predictions that outperforms traditional models like UNet-ConvLSTM, excelling in spatial variability capture and yield prediction, particularly for extreme values in vineyards.

Abstract

Crop yield prediction is essential for agricultural planning but remains challenging due to the complex interactions between weather, climate, and management practices. To address these challenges, we introduce a deep learning-based multi-model called Climate-Management Aware Vision Transformer (CMAViT), designed for pixel-level vineyard yield predictions. CMAViT integrates both spatial and temporal data by leveraging remote sensing imagery and short-term meteorological data, capturing the effects of growing season variations. Additionally, it incorporates management practices, which are represented in text form, using a cross-attention encoder to model their interaction with time-series data. This innovative multi-modal transformer tested on a large dataset from 2016-2019 covering 2,200 hectares and eight grape cultivars including more than 5 million vines, outperforms traditional models like UNet-ConvLSTM, excelling in spatial variability capture and yield prediction, particularly for extreme values in vineyards. CMAViT achieved an R2 of 0.84 and a MAPE of 8.22% on an unseen test dataset. Masking specific modalities lowered performance: excluding management practices, climate data, and both reduced R2 to 0.73, 0.70, and 0.72, respectively, and raised MAPE to 11.92%, 12.66%, and 12.39%, highlighting each modality's importance for accurate yield prediction. Code is available at https://github.com/plant-ai-biophysics-lab/CMAViT.

CMAViT: Integrating Climate, Managment, and Remote Sensing Data for Crop Yield Estimation with Multimodel Vision Transformers

TL;DR

A deep learning-based multi-model designed for pixel-level vineyard yield predictions that outperforms traditional models like UNet-ConvLSTM, excelling in spatial variability capture and yield prediction, particularly for extreme values in vineyards.

Abstract

Crop yield prediction is essential for agricultural planning but remains challenging due to the complex interactions between weather, climate, and management practices. To address these challenges, we introduce a deep learning-based multi-model called Climate-Management Aware Vision Transformer (CMAViT), designed for pixel-level vineyard yield predictions. CMAViT integrates both spatial and temporal data by leveraging remote sensing imagery and short-term meteorological data, capturing the effects of growing season variations. Additionally, it incorporates management practices, which are represented in text form, using a cross-attention encoder to model their interaction with time-series data. This innovative multi-modal transformer tested on a large dataset from 2016-2019 covering 2,200 hectares and eight grape cultivars including more than 5 million vines, outperforms traditional models like UNet-ConvLSTM, excelling in spatial variability capture and yield prediction, particularly for extreme values in vineyards. CMAViT achieved an R2 of 0.84 and a MAPE of 8.22% on an unseen test dataset. Masking specific modalities lowered performance: excluding management practices, climate data, and both reduced R2 to 0.73, 0.70, and 0.72, respectively, and raised MAPE to 11.92%, 12.66%, and 12.39%, highlighting each modality's importance for accurate yield prediction. Code is available at https://github.com/plant-ai-biophysics-lab/CMAViT.

Paper Structure

This paper contains 22 sections, 6 equations, 8 figures, 3 tables, 1 algorithm.

Figures (8)

  • Figure 1: CMAViT: Climate, Management Aware Vision Transformer Model. A multi-model analysis integrating remote sensing imagery, climate data, and management practices to produce a time-series yield map throughout the flowering period, up to pre-harvest.
  • Figure 2: CMAViT: Climate, Management Aware Vision Transformer Model. The model employs a spatio-temporal multi-model framework that learns from time-series observations, including Sentinel-1 and Sentinel-2 satellite imagery, along with climate data. A cross-attention module captures interactions between the output from learned time-series features of the spatio-temporal module and contextual data from management practices. This enables weekly yield map predictions from bud break in early April through to veraison in mid-July.
  • Figure 3: STMM Module. Multimodel to integrate meteorological observation with satellite imagery.
  • Figure 4: Four year yield observations tonne/hectare (t/ha) ground-truth dataset in the Central Valley of California collected by E and J Gallo Winery. Each map is about 7 x 9 km containing about 2,200 hectare of over 5 million grapevine plants.
  • Figure 5: Block Holdout Validation Scenario (BHO). Malvasia Bianca (MB), Merlot (Me), Cabernet Sauvignon (CS), Syrah (Syr), Chardonnay (Ch), Riesling (Ries), Symphony (Sym), and Muscat of Alexandria (MoA).
  • ...and 3 more figures