Winter Wheat Crop Yield Prediction on Multiple Heterogeneous Datasets using Machine Learning
Yogesh Bansal, David Lillis, Mohand Tahar Kechadi
TL;DR
This work tackles winter wheat yield prediction in the UK by modeling yield with ML on heterogeneous zone-based soil and weather data, comparing soil-only inputs to integrated soil and weather inputs. The study finds that adding weather data improves prediction and that non-linear ensemble methods, particularly Gradient Boosting, yield the best MAE (e.g., $MAE\approx1.63\,t/h$ for soil-only and $MAE\approx1.48\,t/h$ for integrated data), though gains are sensitive to data quality and representativeness. The results underscore the value of data fusion and rigorous data-quality assessment for real-world agricultural decision support, while also highlighting limits due to data noise and sample size. Future work points to time-series modeling and richer datasets to better isolate weather effects on yield.
Abstract
Winter wheat is one of the most important crops in the United Kingdom, and crop yield prediction is essential for the nation's food security. Several studies have employed machine learning (ML) techniques to predict crop yield on a county or farm-based level. The main objective of this study is to predict winter wheat crop yield using ML models on multiple heterogeneous datasets, i.e., soil and weather on a zone-based level. Experimental results demonstrated their impact when used alone and in combination. In addition, we employ numerous ML algorithms to emphasize the significance of data quality in any machine-learning strategy.
