Table of Contents
Fetching ...

A Machine Learning Approach to Forecasting Honey Production with Tree-Based Methods

Alessio Brini, Elisa Giovannini, Elia Smaniotto

TL;DR

This study addresses forecasting hive weight variation, or honey production, for Italian apiaries under climate variability. It systematically compares linear (OLS, Ridge, LASSO, Elastic Net), tree-based ensembles (Random Forest, XGBoost, LightGBM), and a simple feedforward neural network using lagged weather and hive-weight features up to four days. Elastic Net and Random Forest consistently deliver strong out-of-sample performance, with ensemble stacking (especially Elastic Net stacking) providing further improvements; interpretability analyses via coefficients and SHAP/permutation methods identify autoregressive hive-weight terms as primary drivers, supplemented by surface solar radiation and precipitation signals. The findings have practical implications for beekeeping risk management and climate-risk insurance design, offering guidance on which predictors matter most and how to combine models for robust forecasts.

Abstract

The beekeeping sector has experienced significant production fluctuations in recent years, largely due to increasingly frequent adverse weather events linked to climate change. These events can severely affect the environment, reducing its suitability for bee activity. We conduct a forecasting analysis of honey production across Italy using a range of machine learning models, with a particular focus on weather-related variables as key predictors. Our analysis relies on a dataset collected in 2022, which combines hive-level observations with detailed weather data. We train and compare several linear and nonlinear models, evaluating both their predictive accuracy and interpretability. By examining model explanations, we identify the main drivers of honey production. We also ensemble models from different families to assess whether combining predictions improves forecast accuracy. These insights support beekeepers in managing production risks and may inform the development of insurance products against unexpected losses due to poor harvests.

A Machine Learning Approach to Forecasting Honey Production with Tree-Based Methods

TL;DR

This study addresses forecasting hive weight variation, or honey production, for Italian apiaries under climate variability. It systematically compares linear (OLS, Ridge, LASSO, Elastic Net), tree-based ensembles (Random Forest, XGBoost, LightGBM), and a simple feedforward neural network using lagged weather and hive-weight features up to four days. Elastic Net and Random Forest consistently deliver strong out-of-sample performance, with ensemble stacking (especially Elastic Net stacking) providing further improvements; interpretability analyses via coefficients and SHAP/permutation methods identify autoregressive hive-weight terms as primary drivers, supplemented by surface solar radiation and precipitation signals. The findings have practical implications for beekeeping risk management and climate-risk insurance design, offering guidance on which predictors matter most and how to combine models for robust forecasts.

Abstract

The beekeeping sector has experienced significant production fluctuations in recent years, largely due to increasingly frequent adverse weather events linked to climate change. These events can severely affect the environment, reducing its suitability for bee activity. We conduct a forecasting analysis of honey production across Italy using a range of machine learning models, with a particular focus on weather-related variables as key predictors. Our analysis relies on a dataset collected in 2022, which combines hive-level observations with detailed weather data. We train and compare several linear and nonlinear models, evaluating both their predictive accuracy and interpretability. By examining model explanations, we identify the main drivers of honey production. We also ensemble models from different families to assess whether combining predictions improves forecast accuracy. These insights support beekeepers in managing production risks and may inform the development of insurance products against unexpected losses due to poor harvests.
Paper Structure (17 sections, 8 equations, 26 figures, 7 tables)

This paper contains 17 sections, 8 equations, 26 figures, 7 tables.

Figures (26)

  • Figure 1: Geographical distribution of the hives over the Italian territory.
  • Figure 2: Daily weight variation $\Delta w$ for a random selection of four hives.
  • Figure 3: Empirical densities of the features outlined in Tab \ref{['Tab:descriptive_stats']}.
  • Figure 4: Heatmap of the percentage of variables that exhibit autocorrelation at different lags. Each entry of the matrix represents the percentage of hive IDs, measured over the cross-section, that exhibit such linear autocorrelation.
  • Figure 5: Prediction error distributions on the test set for Elastic Net across the four evaluation metrics.
  • ...and 21 more figures