Sparsity, Regularization and Causality in Agricultural Yield: The Case of Paddy Rice in Peru
Rita Rocio Guzman-Lopez, Luis Huamanchumo, Kevin Fernandez, Oscar Cutipa-Luque, Yhon Tiahuallpa, Helder Rojas
TL;DR
The paper addresses forecasting paddy rice yield in Peru by fusing agricultural census data with satellite-derived time series. It employs Elastic-Net sparse regression to uncover sparse, Granger-causal relationships between remote-sensing indicators ($NDVI$, $PREC$, $TEMP$) and yield, augmented by velocity and acceleration transformations to capture dynamic effects. Three modeling approaches—Elastic-Net, GAM, and XGBoost—are compared, with velocity/acceleration features enhancing predictive performance and revealing causal links absent in raw variables. The findings support integrating geospatial and climatic predictors with census data to produce more accurate, actionable yield forecasts for strategic agricultural management.
Abstract
This study introduces a novel approach that integrates agricultural census data with remotely sensed time series to develop precise predictive models for paddy rice yield across various regions of Peru. By utilizing sparse regression and Elastic-Net regularization techniques, the study identifies causal relationships between key remotely sensed variables-such as NDVI, precipitation, and temperature-and agricultural yield. To further enhance prediction accuracy, the first- and second-order dynamic transformations (velocity and acceleration) of these variables are applied, capturing non-linear patterns and delayed effects on yield. The findings highlight the improved predictive performance when combining regularization techniques with climatic and geospatial variables, enabling more precise forecasts of yield variability. The results confirm the existence of causal relationships in the Granger sense, emphasizing the value of this methodology for strategic agricultural management. This contributes to more efficient and sustainable production in paddy rice cultivation.
