Predicting Crop Yield With Machine Learning: An Extensive Analysis Of Input Modalities And Models On a Field and sub-field Level
Deepak Pathak, Miro Miranda, Francisco Mena, Cristhian Sanchez, Patrick Helber, Benjamin Bischke, Peter Habelitz, Hiba Najjar, Jayanth Siddamsetty, Diego Arenas, Michaela Vollmer, Marcela Charfuelan, Marlon Nuske, Andreas Dengel
TL;DR
The paper tackles global crop yield prediction by integrating multiple data modalities through an early fusion pipeline that concatenates 24 timesteps of Sentinel-2 imagery with weather, soil, and DEM features to predict pixel-level yields at $10m$ resolution. It evaluates on Germany, Argentina, and Uruguay across wheat, rapeseed, and soybean, showing multimodal inputs outperform Sentinel-2 alone and highlighting region- and crop-dependent modality importance (e.g., $R^2=0.82$ for Argentina soybean with $S2+DEM$ and $R^2=0.78$ for Germany rapeseed with $S2+Soil$). The approach uses two models, $LGBM$ and $LSTM$, with rigorous 10-fold stratified grouped cross-validation, and reports improvements in $R^2$ and robust sub-field variability capture. The work demonstrates the value of globally scalable, multimodal data fusion for precision agriculture and motivates exploring alternative fusion strategies and additional data sources.
Abstract
We introduce a simple yet effective early fusion method for crop yield prediction that handles multiple input modalities with different temporal and spatial resolutions. We use high-resolution crop yield maps as ground truth data to train crop and machine learning model agnostic methods at the sub-field level. We use Sentinel-2 satellite imagery as the primary modality for input data with other complementary modalities, including weather, soil, and DEM data. The proposed method uses input modalities available with global coverage, making the framework globally scalable. We explicitly highlight the importance of input modalities for crop yield prediction and emphasize that the best-performing combination of input modalities depends on region, crop, and chosen model.
