Table of Contents
Fetching ...

Adaptive Fusion of Multi-view Remote Sensing data for Optimal Sub-field Crop Yield Prediction

Francisco Mena, Deepak Pathak, Hiba Najjar, Cristhian Sanchez, Patrick Helber, Benjamin Bischke, Peter Habelitz, Miro Miranda, Jayanth Siddamsetty, Marlon Nuske, Marcela Charfuelan, Diego Arenas, Michaela Vollmer, Andreas Dengel

TL;DR

This work addresses sub-field crop yield prediction using multi-view remote sensing data by introducing Multi-View Gated Fusion (MVGF), a two-component model with dedicated view-encoders and a gated fusion unit that adaptively weighs heterogeneous inputs per pixel. By fusing Sentinel-2 optical time-series with weather, DEM, and soil features, MVGF outperforms input-level and single-view baselines across soybean, rapeseed, and wheat datasets from Argentina, Uruguay, and Germany, achieving field-level $R^2$ around 0.80 and sub-field $R^2$ around 0.68 (ARG-S) to 0.44 (others). The authors provide extensive analyses, including visualizations of yield maps and gating weights, LOYO cross-validation, and ablation studies, showing the fusion weights vary with country and crop-type and enabling data-imputation-like benefits in fields with limited optical coverage. The work demonstrates the value of adaptive, sample-specific fusion in MVL for RS-driven crop yield tasks and offers insights into interpretability via fusion weights and view contributions.

Abstract

Accurate crop yield prediction is of utmost importance for informed decision-making in agriculture, aiding farmers, and industry stakeholders. However, this task is complex and depends on multiple factors, such as environmental conditions, soil properties, and management practices. Combining heterogeneous data views poses a fusion challenge, like identifying the view-specific contribution to the predictive task. We present a novel multi-view learning approach to predict crop yield for different crops (soybean, wheat, rapeseed) and regions (Argentina, Uruguay, and Germany). Our multi-view input data includes multi-spectral optical images from Sentinel-2 satellites and weather data as dynamic features during the crop growing season, complemented by static features like soil properties and topographic information. To effectively fuse the data, we introduce a Multi-view Gated Fusion (MVGF) model, comprising dedicated view-encoders and a Gated Unit (GU) module. The view-encoders handle the heterogeneity of data sources with varying temporal resolutions by learning a view-specific representation. These representations are adaptively fused via a weighted sum. The fusion weights are computed for each sample by the GU using a concatenation of the view-representations. The MVGF model is trained at sub-field level with 10 m resolution pixels. Our evaluations show that the MVGF outperforms conventional models on the same task, achieving the best results by incorporating all the data sources, unlike the usual fusion results in the literature. For Argentina, the MVGF model achieves an R2 value of 0.68 at sub-field yield prediction, while at field level evaluation (comparing field averages), it reaches around 0.80 across different countries. The GU module learned different weights based on the country and crop-type, aligning with the variable significance of each data source to the prediction task.

Adaptive Fusion of Multi-view Remote Sensing data for Optimal Sub-field Crop Yield Prediction

TL;DR

This work addresses sub-field crop yield prediction using multi-view remote sensing data by introducing Multi-View Gated Fusion (MVGF), a two-component model with dedicated view-encoders and a gated fusion unit that adaptively weighs heterogeneous inputs per pixel. By fusing Sentinel-2 optical time-series with weather, DEM, and soil features, MVGF outperforms input-level and single-view baselines across soybean, rapeseed, and wheat datasets from Argentina, Uruguay, and Germany, achieving field-level around 0.80 and sub-field around 0.68 (ARG-S) to 0.44 (others). The authors provide extensive analyses, including visualizations of yield maps and gating weights, LOYO cross-validation, and ablation studies, showing the fusion weights vary with country and crop-type and enabling data-imputation-like benefits in fields with limited optical coverage. The work demonstrates the value of adaptive, sample-specific fusion in MVL for RS-driven crop yield tasks and offers insights into interpretability via fusion weights and view contributions.

Abstract

Accurate crop yield prediction is of utmost importance for informed decision-making in agriculture, aiding farmers, and industry stakeholders. However, this task is complex and depends on multiple factors, such as environmental conditions, soil properties, and management practices. Combining heterogeneous data views poses a fusion challenge, like identifying the view-specific contribution to the predictive task. We present a novel multi-view learning approach to predict crop yield for different crops (soybean, wheat, rapeseed) and regions (Argentina, Uruguay, and Germany). Our multi-view input data includes multi-spectral optical images from Sentinel-2 satellites and weather data as dynamic features during the crop growing season, complemented by static features like soil properties and topographic information. To effectively fuse the data, we introduce a Multi-view Gated Fusion (MVGF) model, comprising dedicated view-encoders and a Gated Unit (GU) module. The view-encoders handle the heterogeneity of data sources with varying temporal resolutions by learning a view-specific representation. These representations are adaptively fused via a weighted sum. The fusion weights are computed for each sample by the GU using a concatenation of the view-representations. The MVGF model is trained at sub-field level with 10 m resolution pixels. Our evaluations show that the MVGF outperforms conventional models on the same task, achieving the best results by incorporating all the data sources, unlike the usual fusion results in the literature. For Argentina, the MVGF model achieves an R2 value of 0.68 at sub-field yield prediction, while at field level evaluation (comparing field averages), it reaches around 0.80 across different countries. The GU module learned different weights based on the country and crop-type, aligning with the variable significance of each data source to the prediction task.
Paper Structure (48 sections, 10 equations, 20 figures, 14 tables)

This paper contains 48 sections, 10 equations, 20 figures, 14 tables.

Figures (20)

  • Figure 1: Crop yield distribution per pixel (at 10 m resolution) in the four datasets considered in this study.
  • Figure 2: Field spatial coverage (labels 4 and 5 in S2-based SCL) across the growing season on different fields. Different sub-figures show different datasets used in this study. Each point in a boxplot represents the spatial coverage of a field in the corresponding month. The spatial coverage of the fields is grouped by month for display. The month index goes from January in the seeding year (1) to December of the following year (24).
  • Figure 3: Illustration of spatial alignment applied to the four input views for a specific field. After this process, all views have a spatial resolution of 10 m/px.
  • Figure 4: Illustration of the used gating mechanism (GU). The four view-representation are merged with $\mathsf{M}$ and linearly projected to a four-dimensional vector. Then, softmax function is applied and normalized fusion weights are obtained.
  • Figure 5: Illustration of the proposed Multi-View Gated Fusion (MVGF) model with the four views used. "S" represents the vector stacking operation, and "P" the dot product. The forward pass is shown with a black arrow, while the dotted arrow shows the additional connections for the GU. The model is learned end-to-end by comparing the prediction with the ground truth. The red dotted arrows illustrate the backward pass of the loss function through the model components.
  • ...and 15 more figures