Table of Contents
Fetching ...

Sparsity, Regularization and Causality in Agricultural Yield: The Case of Paddy Rice in Peru

Rita Rocio Guzman-Lopez, Luis Huamanchumo, Kevin Fernandez, Oscar Cutipa-Luque, Yhon Tiahuallpa, Helder Rojas

TL;DR

The paper addresses forecasting paddy rice yield in Peru by fusing agricultural census data with satellite-derived time series. It employs Elastic-Net sparse regression to uncover sparse, Granger-causal relationships between remote-sensing indicators ($NDVI$, $PREC$, $TEMP$) and yield, augmented by velocity and acceleration transformations to capture dynamic effects. Three modeling approaches—Elastic-Net, GAM, and XGBoost—are compared, with velocity/acceleration features enhancing predictive performance and revealing causal links absent in raw variables. The findings support integrating geospatial and climatic predictors with census data to produce more accurate, actionable yield forecasts for strategic agricultural management.

Abstract

This study introduces a novel approach that integrates agricultural census data with remotely sensed time series to develop precise predictive models for paddy rice yield across various regions of Peru. By utilizing sparse regression and Elastic-Net regularization techniques, the study identifies causal relationships between key remotely sensed variables-such as NDVI, precipitation, and temperature-and agricultural yield. To further enhance prediction accuracy, the first- and second-order dynamic transformations (velocity and acceleration) of these variables are applied, capturing non-linear patterns and delayed effects on yield. The findings highlight the improved predictive performance when combining regularization techniques with climatic and geospatial variables, enabling more precise forecasts of yield variability. The results confirm the existence of causal relationships in the Granger sense, emphasizing the value of this methodology for strategic agricultural management. This contributes to more efficient and sustainable production in paddy rice cultivation.

Sparsity, Regularization and Causality in Agricultural Yield: The Case of Paddy Rice in Peru

TL;DR

The paper addresses forecasting paddy rice yield in Peru by fusing agricultural census data with satellite-derived time series. It employs Elastic-Net sparse regression to uncover sparse, Granger-causal relationships between remote-sensing indicators (, , ) and yield, augmented by velocity and acceleration transformations to capture dynamic effects. Three modeling approaches—Elastic-Net, GAM, and XGBoost—are compared, with velocity/acceleration features enhancing predictive performance and revealing causal links absent in raw variables. The findings support integrating geospatial and climatic predictors with census data to produce more accurate, actionable yield forecasts for strategic agricultural management.

Abstract

This study introduces a novel approach that integrates agricultural census data with remotely sensed time series to develop precise predictive models for paddy rice yield across various regions of Peru. By utilizing sparse regression and Elastic-Net regularization techniques, the study identifies causal relationships between key remotely sensed variables-such as NDVI, precipitation, and temperature-and agricultural yield. To further enhance prediction accuracy, the first- and second-order dynamic transformations (velocity and acceleration) of these variables are applied, capturing non-linear patterns and delayed effects on yield. The findings highlight the improved predictive performance when combining regularization techniques with climatic and geospatial variables, enabling more precise forecasts of yield variability. The results confirm the existence of causal relationships in the Granger sense, emphasizing the value of this methodology for strategic agricultural management. This contributes to more efficient and sustainable production in paddy rice cultivation.
Paper Structure (14 sections, 14 equations, 8 figures, 3 tables)

This paper contains 14 sections, 14 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: The graphic illustrates the process of extracting remotely sensed time series using Google Earth Engine. On the right, three images are shown, each color-coded to represent a different type of series: NDVI, precipitation, and temperature.
  • Figure 2: The graph presents a time series of the Normalized Difference Vegetation Index (NDVI) with a 16-day frequency, obtained using the MOD13Q1 product from the MODIS satellite, with a spatial resolution of 250 meters.
  • Figure 3: The graph presents a time series of monthly precipitation, measured in millimeters, using the CHIRPS product from 2014 to 2023.
  • Figure 4: The graph presents a time series of land surface temperature (LST), measured in degrees Celsius on a daily basis, provided by the MOD11A1 sensor, covering the period from June 2014 to September 2023.
  • Figure 5: Graph to determine the optimal value of $\lambda$. Vertical axis: the MSE calculated through cross-validation. Horizontal axis: values of $\lambda$ on a logarithmic scale. The red line represents the average value of the MSE, and the gray bands represent their respective confidence intervals.
  • ...and 3 more figures