Table of Contents
Fetching ...

VITA: Variational Pretraining of Transformers for Climate-Robust Crop Yield Forecasting

Adib Hasan, Mardavij Roozbehani, Munther Dahleh

TL;DR

VITA tackles climate-robust crop yield forecasting under data asymmetry by pretraining a Transformer encoder on rich satellite-based weather data and transferring to ground-based, limited-weather settings. It uses a decoder-free variational objective with a seasonality-aware sinusoidal prior to learn latent atmospheric representations and then fine-tunes with limited weather statistics and past yields. Across 763 US Corn Belt counties, VITA achieves state-of-the-art performance, especially in extreme years, with strong cross-regional transfer and data-efficiency advantages over larger foundational models. The approach offers practical, scalable deployment using public data, enabling improved risk management and resilience in climate-impacted agriculture.

Abstract

Accurate crop yield forecasting is essential for global food security. However, current AI models systematically underperform when yields deviate from historical trends. We attribute this to the lack of rich, physically grounded datasets directly linking atmospheric states to yields. To address this, we introduce VITA (Variational Inference Transformer for Asymmetric Data), a variational pretraining framework that learns representations from large satellite-based weather datasets and transfers to the ground-based limited measurements available for yield prediction. VITA is trained using detailed meteorological variables as proxy targets during pretraining and learns to predict latent atmospheric states under a seasonality-aware sinusoidal prior. This allows the model to be fine-tuned using limited weather statistics during deployment. Applied to 763 counties in the US Corn Belt, VITA achieves state-of-the-art performance in predicting corn and soybean yields across all evaluation scenarios, particularly during extreme years, with statistically significant improvements (paired t-test, p < 0.0001). Importantly, VITA outperforms prior frameworks like GNN-RNN without soil data, and larger foundational models (e.g., Chronos-Bolt) with less compute, making it practical for real-world use, especially in data-scarce regions. This work highlights how domain-aware AI design can overcome data limitations and support resilient agricultural forecasting in a changing climate.

VITA: Variational Pretraining of Transformers for Climate-Robust Crop Yield Forecasting

TL;DR

VITA tackles climate-robust crop yield forecasting under data asymmetry by pretraining a Transformer encoder on rich satellite-based weather data and transferring to ground-based, limited-weather settings. It uses a decoder-free variational objective with a seasonality-aware sinusoidal prior to learn latent atmospheric representations and then fine-tunes with limited weather statistics and past yields. Across 763 US Corn Belt counties, VITA achieves state-of-the-art performance, especially in extreme years, with strong cross-regional transfer and data-efficiency advantages over larger foundational models. The approach offers practical, scalable deployment using public data, enabling improved risk management and resilience in climate-impacted agriculture.

Abstract

Accurate crop yield forecasting is essential for global food security. However, current AI models systematically underperform when yields deviate from historical trends. We attribute this to the lack of rich, physically grounded datasets directly linking atmospheric states to yields. To address this, we introduce VITA (Variational Inference Transformer for Asymmetric Data), a variational pretraining framework that learns representations from large satellite-based weather datasets and transfers to the ground-based limited measurements available for yield prediction. VITA is trained using detailed meteorological variables as proxy targets during pretraining and learns to predict latent atmospheric states under a seasonality-aware sinusoidal prior. This allows the model to be fine-tuned using limited weather statistics during deployment. Applied to 763 counties in the US Corn Belt, VITA achieves state-of-the-art performance in predicting corn and soybean yields across all evaluation scenarios, particularly during extreme years, with statistically significant improvements (paired t-test, p < 0.0001). Importantly, VITA outperforms prior frameworks like GNN-RNN without soil data, and larger foundational models (e.g., Chronos-Bolt) with less compute, making it practical for real-world use, especially in data-scarce regions. This work highlights how domain-aware AI design can overcome data limitations and support resilient agricultural forecasting in a changing climate.

Paper Structure

This paper contains 69 sections, 18 equations, 7 figures, 13 tables.

Figures (7)

  • Figure 1: Two-stage variational training framework for asymmetric weather features. (a) A transformer encoder is pretrained on 31-variable weather time series by randomly masking $10 \leq k \leq 25$ features and predicting them from remaining context. The model learns a variational posterior $q_\phi(z_i \mid x_i)$ over weather representations by directly maximizing variational likelihood. (b) During fine-tuning, only 6 weather features are available. The pretrained transformer encodes these into a latent distribution $q_\phi(z_j \mid x_j)$, from which $z_j \sim \mathcal{N}(\mu_\phi(x_j), \sigma_\phi^2(x_j))$ is sampled. It is aggregated with learnable attention across time dimension and concatenated with historical yield $y_{\text{past}}$ for final yield prediction.
  • Figure 2: Graphical model showing the data structure of pretraining and prediction phases in VITA.
  • Figure 3: Mean crop yield in bushels per acre (bu/ac) 763 US Corn Belt counties showing extreme weather years as sharp deviations from historical patterns.
  • Figure 4: VITA-Sinusoidal shows consistent improvement over other baselines.
  • Figure 5: PCA visualization of the latent weather representations for two extreme years (2004: record-breaking yield, 2012: extreme drought) under different modeling choices. (a) T-BERT (non-variational) shows limited separation and explains 84.0% of variance in 2D, reflecting a narrow, collapsed latent space. (b) VITA with a standard normal prior yields more separated clusters and explains 34.7% of variance. (c) Sinusoidal prior induces visually tighter clusters but with only 15.7% explained variance, indicating even spread of variance into higher-order components.
  • ...and 2 more figures