Table of Contents
Fetching ...

Forecasting emergency department visits in the reference hospital of the Balearic Islands: the role of tourist and weather data

Paride Crisafulli, Angel del Río Mangada, Juan José Segura Sampedro, Claudio R. Mirasso, Raúl Toral, Tobias Galla

TL;DR

This study addresses forecasting emergency department visits at a Balearic Island reference hospital by leveraging exogenous variables—calendar effects and population dynamics (resident and tourist)—and evaluating non-time-series ML models against traditional time-series approaches. The random forest using calendar and population inputs without weather (RF-No W) emerges as the most robust non-time-series predictor across shifts and patient risk cohorts, outperforming or matching time-series baselines, while weather data provides no meaningful gain. Diebold–Mariano tests confirm the statistical relevance of certain variable choices, though practical impact on daily staffing (<2 patients per shift) remains modest. The findings advocate for carefully chosen exogenous inputs, highlighting that long-horizon forecasts can be effectively produced with simpler models, albeit with retraining required following major disruptions such as the COVID-19 pandemic.

Abstract

Accurate forecasting of patient arrivals at emergency departments (EDs) is vital for efficient resource allocation and high-quality patient care. In this study we investigate the relevance of exogenous variables, namely tourism, weather, calendar and demographic variables, in forecasting ED visits in the reference hospital in Palma de Mallorca, a city with significant seasonal population fluctuations due to tourism. Using a machine learning approach, we develop a model that predicts ED visits based solely on these exogenous variables. We test different machine learning algorithms (random forests, support vector machines, and feedforward neural networks) with different combinations of input variables and compare their symmetric mean average percentage errors (SMAPEs). Our findings reveal that calendar information, resident, and tourist population data are statistically significant for the accuracy of the predictions, while the addition of weather data does not provide any further improvement. Comparison of non-time-series with time-series prediction models reveals that the latter provide better accuracy for short prediction horizons (e.g. shorter than a week). Furthermore, time-series models become less or equally accurate to models relying only on exogenous variables for long prediction horizons (e.g. fortnight or month). Our study highlights the importance of carefully selecting predictive variables to ensure short- and long-term, robust and reliable forecasts. This demonstrates that, despite their lower complexity, non-time-series models with well-chosen input variables can be as effective as time-series models when predicting for long time horizons.

Forecasting emergency department visits in the reference hospital of the Balearic Islands: the role of tourist and weather data

TL;DR

This study addresses forecasting emergency department visits at a Balearic Island reference hospital by leveraging exogenous variables—calendar effects and population dynamics (resident and tourist)—and evaluating non-time-series ML models against traditional time-series approaches. The random forest using calendar and population inputs without weather (RF-No W) emerges as the most robust non-time-series predictor across shifts and patient risk cohorts, outperforming or matching time-series baselines, while weather data provides no meaningful gain. Diebold–Mariano tests confirm the statistical relevance of certain variable choices, though practical impact on daily staffing (<2 patients per shift) remains modest. The findings advocate for carefully chosen exogenous inputs, highlighting that long-horizon forecasts can be effectively produced with simpler models, albeit with retraining required following major disruptions such as the COVID-19 pandemic.

Abstract

Accurate forecasting of patient arrivals at emergency departments (EDs) is vital for efficient resource allocation and high-quality patient care. In this study we investigate the relevance of exogenous variables, namely tourism, weather, calendar and demographic variables, in forecasting ED visits in the reference hospital in Palma de Mallorca, a city with significant seasonal population fluctuations due to tourism. Using a machine learning approach, we develop a model that predicts ED visits based solely on these exogenous variables. We test different machine learning algorithms (random forests, support vector machines, and feedforward neural networks) with different combinations of input variables and compare their symmetric mean average percentage errors (SMAPEs). Our findings reveal that calendar information, resident, and tourist population data are statistically significant for the accuracy of the predictions, while the addition of weather data does not provide any further improvement. Comparison of non-time-series with time-series prediction models reveals that the latter provide better accuracy for short prediction horizons (e.g. shorter than a week). Furthermore, time-series models become less or equally accurate to models relying only on exogenous variables for long prediction horizons (e.g. fortnight or month). Our study highlights the importance of carefully selecting predictive variables to ensure short- and long-term, robust and reliable forecasts. This demonstrates that, despite their lower complexity, non-time-series models with well-chosen input variables can be as effective as time-series models when predicting for long time horizons.
Paper Structure (21 sections, 2 equations, 11 figures, 4 tables)

This paper contains 21 sections, 2 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Number of incoming patients (NIP) to the ED of Son Espases. Each point corresponds to the NIP for a specific day and shift. Each subfigure shows a different shift (morning in green, afternoon in red, night in blue). The purple points between the dates March 1, 2020 and December 31, 2021 are the values registered during the assumed pandemic period, and excluded from our analysis.
  • Figure 2: Hospitalization risk $h$ as a function of the patient's age cohort. $h$ is the fraction of patients of a specific cohort who were hospitalized immediately after attending the ED. This is used to group patients in three different risk groups: low-risk group ($h<0.1$, green line), medium-risk group ($0.1<h<0.2$, orange line), and high-risk group ($h>0.2$, red line).
  • Figure 3: Average NIP in the different shifts for given weekdays and months. The panels respectively show morning, afternoon, and night shifts. Markers show the average of all NIPs for a specific shift, month, and weekday. For example, the lower plot shows that the average NIP during a night shift on Wednesdays in July is 65 (light green triangle point). As in Figure \ref{['Fig1']}, the seasonal behavior of NIP is more pronounced for the night shift and almost absent for the morning shift.
  • Figure 5: This figure highlights which combinations of non-time-series models and input variables have equal predictive accuracy according to the DM test (shift-based predictions in panel (a) and risk-group-based predictions in panel (b)). Each white disk corresponds to a different combination, as indicated on the axes. Since the test is symmetric, we only show each comparison once and hence the lower-right part of each diagram is empty. The colored circles inside the white disks indicate for which shifts or risk groups the two models represented by the disk have equal predictive accuracy for at least 75% of bootstrap samples.
  • Figure 6: The figures illustrate how predictions of the optimal model (RF-No W) align with true data in two distinct time windows (May-June in panel (a) and November-December in panel (b)). The points represent the actual NIP, while the colored bars indicate the RMSE around the predicted value.
  • ...and 6 more figures