Table of Contents
Fetching ...

Predicting Crop Yield With Machine Learning: An Extensive Analysis Of Input Modalities And Models On a Field and sub-field Level

Deepak Pathak, Miro Miranda, Francisco Mena, Cristhian Sanchez, Patrick Helber, Benjamin Bischke, Peter Habelitz, Hiba Najjar, Jayanth Siddamsetty, Diego Arenas, Michaela Vollmer, Marcela Charfuelan, Marlon Nuske, Andreas Dengel

TL;DR

The paper tackles global crop yield prediction by integrating multiple data modalities through an early fusion pipeline that concatenates 24 timesteps of Sentinel-2 imagery with weather, soil, and DEM features to predict pixel-level yields at $10m$ resolution. It evaluates on Germany, Argentina, and Uruguay across wheat, rapeseed, and soybean, showing multimodal inputs outperform Sentinel-2 alone and highlighting region- and crop-dependent modality importance (e.g., $R^2=0.82$ for Argentina soybean with $S2+DEM$ and $R^2=0.78$ for Germany rapeseed with $S2+Soil$). The approach uses two models, $LGBM$ and $LSTM$, with rigorous 10-fold stratified grouped cross-validation, and reports improvements in $R^2$ and robust sub-field variability capture. The work demonstrates the value of globally scalable, multimodal data fusion for precision agriculture and motivates exploring alternative fusion strategies and additional data sources.

Abstract

We introduce a simple yet effective early fusion method for crop yield prediction that handles multiple input modalities with different temporal and spatial resolutions. We use high-resolution crop yield maps as ground truth data to train crop and machine learning model agnostic methods at the sub-field level. We use Sentinel-2 satellite imagery as the primary modality for input data with other complementary modalities, including weather, soil, and DEM data. The proposed method uses input modalities available with global coverage, making the framework globally scalable. We explicitly highlight the importance of input modalities for crop yield prediction and emphasize that the best-performing combination of input modalities depends on region, crop, and chosen model.

Predicting Crop Yield With Machine Learning: An Extensive Analysis Of Input Modalities And Models On a Field and sub-field Level

TL;DR

The paper tackles global crop yield prediction by integrating multiple data modalities through an early fusion pipeline that concatenates 24 timesteps of Sentinel-2 imagery with weather, soil, and DEM features to predict pixel-level yields at resolution. It evaluates on Germany, Argentina, and Uruguay across wheat, rapeseed, and soybean, showing multimodal inputs outperform Sentinel-2 alone and highlighting region- and crop-dependent modality importance (e.g., for Argentina soybean with and for Germany rapeseed with ). The approach uses two models, and , with rigorous 10-fold stratified grouped cross-validation, and reports improvements in and robust sub-field variability capture. The work demonstrates the value of globally scalable, multimodal data fusion for precision agriculture and motivates exploring alternative fusion strategies and additional data sources.

Abstract

We introduce a simple yet effective early fusion method for crop yield prediction that handles multiple input modalities with different temporal and spatial resolutions. We use high-resolution crop yield maps as ground truth data to train crop and machine learning model agnostic methods at the sub-field level. We use Sentinel-2 satellite imagery as the primary modality for input data with other complementary modalities, including weather, soil, and DEM data. The proposed method uses input modalities available with global coverage, making the framework globally scalable. We explicitly highlight the importance of input modalities for crop yield prediction and emphasize that the best-performing combination of input modalities depends on region, crop, and chosen model.
Paper Structure (8 sections, 1 figure, 3 tables)

This paper contains 8 sections, 1 figure, 3 tables.

Figures (1)

  • Figure 1: (a) Framework for multimodal data fusion for yield predictions. Multiple modalities with different spatial and temporal resolutions are fused at the input level. A machine learning model is then trained pixel-wise to produce yield predictions in $10m$ resolution. (b) Performance plots for visual inspection of a single field. Yield data from soybean in Argentina is shown, harvested in 2021. The model was trained on Sentinel-2 and DEM data. Upper left: ground truth yield map, upper middle: pixel-based yield prediction, upper right: scatterplot comparing predictions with ground truth data, lower left: relative prediction clipped at 100%, lower middle: relative prediction error in full range, lower right: distribution plot of predictions against the target.