Adaptive Fusion of Multi-view Remote Sensing data for Optimal Sub-field Crop Yield Prediction
Francisco Mena, Deepak Pathak, Hiba Najjar, Cristhian Sanchez, Patrick Helber, Benjamin Bischke, Peter Habelitz, Miro Miranda, Jayanth Siddamsetty, Marlon Nuske, Marcela Charfuelan, Diego Arenas, Michaela Vollmer, Andreas Dengel
TL;DR
This work addresses sub-field crop yield prediction using multi-view remote sensing data by introducing Multi-View Gated Fusion (MVGF), a two-component model with dedicated view-encoders and a gated fusion unit that adaptively weighs heterogeneous inputs per pixel. By fusing Sentinel-2 optical time-series with weather, DEM, and soil features, MVGF outperforms input-level and single-view baselines across soybean, rapeseed, and wheat datasets from Argentina, Uruguay, and Germany, achieving field-level $R^2$ around 0.80 and sub-field $R^2$ around 0.68 (ARG-S) to 0.44 (others). The authors provide extensive analyses, including visualizations of yield maps and gating weights, LOYO cross-validation, and ablation studies, showing the fusion weights vary with country and crop-type and enabling data-imputation-like benefits in fields with limited optical coverage. The work demonstrates the value of adaptive, sample-specific fusion in MVL for RS-driven crop yield tasks and offers insights into interpretability via fusion weights and view contributions.
Abstract
Accurate crop yield prediction is of utmost importance for informed decision-making in agriculture, aiding farmers, and industry stakeholders. However, this task is complex and depends on multiple factors, such as environmental conditions, soil properties, and management practices. Combining heterogeneous data views poses a fusion challenge, like identifying the view-specific contribution to the predictive task. We present a novel multi-view learning approach to predict crop yield for different crops (soybean, wheat, rapeseed) and regions (Argentina, Uruguay, and Germany). Our multi-view input data includes multi-spectral optical images from Sentinel-2 satellites and weather data as dynamic features during the crop growing season, complemented by static features like soil properties and topographic information. To effectively fuse the data, we introduce a Multi-view Gated Fusion (MVGF) model, comprising dedicated view-encoders and a Gated Unit (GU) module. The view-encoders handle the heterogeneity of data sources with varying temporal resolutions by learning a view-specific representation. These representations are adaptively fused via a weighted sum. The fusion weights are computed for each sample by the GU using a concatenation of the view-representations. The MVGF model is trained at sub-field level with 10 m resolution pixels. Our evaluations show that the MVGF outperforms conventional models on the same task, achieving the best results by incorporating all the data sources, unlike the usual fusion results in the literature. For Argentina, the MVGF model achieves an R2 value of 0.68 at sub-field yield prediction, while at field level evaluation (comparing field averages), it reaches around 0.80 across different countries. The GU module learned different weights based on the country and crop-type, aligning with the variable significance of each data source to the prediction task.
