Table of Contents
Fetching ...

Forecasting trends in food security with real time data

Joschka Herteux, Christoph Räth, Giulia Martini, Amine Baha, Kyriacos Koupparis, Ilaria Lauzana, Duccio Piovani

TL;DR

The methodology introduced establishes the groundwork for a global, data-driven early warning system designed to anticipate and detect food insecurity, using a machine-learning methodology that combines publicly available ecological, social-economic, and conflict-related data.

Abstract

Early warning systems are an essential tool for effective humanitarian action. Advance warnings on impending disasters facilitate timely and targeted response which help save lives and livelihoods. In this work we present a quantitative methodology to forecast levels of food consumption for 60 consecutive days, at the sub-national level, in four countries: Mali, Nigeria, Syria, and Yemen. The methodology is built on publicly available data from the World Food Programme's global hunger monitoring system which collects, processes, and displays daily updates on key food security metrics, conflict, weather events, and other drivers of food insecurity. In this study we assessed the performance of various models including Autoregressive Integrated Moving Average (ARIMA), Extreme Gradient Boosting (XGBoost), Long Short Term Memory (LSTM) Network, Convolutional Neural Network (CNN), and Reservoir Computing (RC), by comparing their Root Mean Squared Error (RMSE) metrics. Our findings highlight Reservoir Computing as a particularly well-suited model in the field of food security given both its notable resistance to over-fitting on limited data samples and its efficient training capabilities. The methodology we introduce establishes the groundwork for a global, data-driven early warning system designed to anticipate and detect food insecurity.

Forecasting trends in food security with real time data

TL;DR

The methodology introduced establishes the groundwork for a global, data-driven early warning system designed to anticipate and detect food insecurity, using a machine-learning methodology that combines publicly available ecological, social-economic, and conflict-related data.

Abstract

Early warning systems are an essential tool for effective humanitarian action. Advance warnings on impending disasters facilitate timely and targeted response which help save lives and livelihoods. In this work we present a quantitative methodology to forecast levels of food consumption for 60 consecutive days, at the sub-national level, in four countries: Mali, Nigeria, Syria, and Yemen. The methodology is built on publicly available data from the World Food Programme's global hunger monitoring system which collects, processes, and displays daily updates on key food security metrics, conflict, weather events, and other drivers of food insecurity. In this study we assessed the performance of various models including Autoregressive Integrated Moving Average (ARIMA), Extreme Gradient Boosting (XGBoost), Long Short Term Memory (LSTM) Network, Convolutional Neural Network (CNN), and Reservoir Computing (RC), by comparing their Root Mean Squared Error (RMSE) metrics. Our findings highlight Reservoir Computing as a particularly well-suited model in the field of food security given both its notable resistance to over-fitting on limited data samples and its efficient training capabilities. The methodology we introduce establishes the groundwork for a global, data-driven early warning system designed to anticipate and detect food insecurity.
Paper Structure (13 sections, 3 equations, 7 figures, 5 tables)

This paper contains 13 sections, 3 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Input Data: the figure shows the time series of the data used in constructing the forecasting methodology. The target variable, highlighted by the blue dashed curve in (a), is the regional prevalence of insufficient food consumption extracted from the Food Consumption Score (FCS). In addition to the historical values of the target, the methodology incorporates predictors coming from another food security indicator, (b) climate , (c) conflict, and (e) economic data. External datasets with known future values, including crop calendars and Ramadan days, are also considered (d). While the figures are based on data from Mali, the framework remains consistent across all countries. A detailed breakdown of the data for each country can be found in Table \ref{['tab: input data']}, with comprehensive information on data sources available in the methods section. (f) The map displays the first administrative level boundaries in the four tested countries, where grey dashed polygons indicate regions that were excluded due to data unavailability.
  • Figure 2: Forecasts and Data: (a) 60-day forecasts examples generated using the RC, CNN, LSTM, and ARIMA models for four specific sub-national regions: Yemen, Syria, Mali, and Nigeria. In the visual representation, the blue curve represents the actual data, while each of the other curves depicts the prediction of one of the models. (b) The observed distribution of the variation of the prevalence of insufficient food consumption on the 60-day windows used to train and test the algorithms. We can appreciate how the dataset is biased towards curves of small variation.
  • Figure 3: Forecasting Aggregated Performances: Performances of the LSTM, CNN and ARIMA models measured in median RMSE, aggregated in three different ways: Foretasting time step a, variation of the target variable b, per country c. Figure a shows how the RC model outperforms the other methodologies after the 15th forested step with an aggregated RMSE at the end of the 60-day window of 4.9 percentage points. In figure b we see RC tends is consistently the best performer when aggregating on curves according to the variation of their target variable $\Delta = \textbf{fcs}_{60}-\textbf{fcs}_{0}$. This is more evident for high values of $\Delta$, that indicate a sharp increase of levels of insufficient food consumption. The barplot in figure c aggregates the error per country. Despite the fact the RC is among the top performers for every country there is not a clear preferred methodology. This is due to the fact that regional time series belonging to the same country can have very different behaviours and have no clear national characteristic.
  • Figure 4: Classification Performance: This figures compares the performances of the ARIMA, CNN, LSTM and RC models on their ability to distinguish different classes of behaviour: No Change, Improvement and Deterioration as defined in (a). The same criterion in applied to actual curves and predicted curves to label the behaviour. (b) Performances metrics on the classification task. Accuracy on singles classes are computed treating the problem as a binary classification (c) Confusion Matrices of all tested methodologies.
  • Figure 5: Feature Selection: Feature groupings that were considered in the grid search for the RC, CNN and LSTM models to implement a dynamic feature selection.(a) Schema of how the features were classified into 5 different feature groupings. (b-d) frequency with which each feature grouping was selected for in (a) the RC model, (b) the LSTM model and (c) the CNN model.
  • ...and 2 more figures