Forecasting emergency department visits in the reference hospital of the Balearic Islands: the role of tourist and weather data
Paride Crisafulli, Angel del Río Mangada, Juan José Segura Sampedro, Claudio R. Mirasso, Raúl Toral, Tobias Galla
TL;DR
This study addresses forecasting emergency department visits at a Balearic Island reference hospital by leveraging exogenous variables—calendar effects and population dynamics (resident and tourist)—and evaluating non-time-series ML models against traditional time-series approaches. The random forest using calendar and population inputs without weather (RF-No W) emerges as the most robust non-time-series predictor across shifts and patient risk cohorts, outperforming or matching time-series baselines, while weather data provides no meaningful gain. Diebold–Mariano tests confirm the statistical relevance of certain variable choices, though practical impact on daily staffing (<2 patients per shift) remains modest. The findings advocate for carefully chosen exogenous inputs, highlighting that long-horizon forecasts can be effectively produced with simpler models, albeit with retraining required following major disruptions such as the COVID-19 pandemic.
Abstract
Accurate forecasting of patient arrivals at emergency departments (EDs) is vital for efficient resource allocation and high-quality patient care. In this study we investigate the relevance of exogenous variables, namely tourism, weather, calendar and demographic variables, in forecasting ED visits in the reference hospital in Palma de Mallorca, a city with significant seasonal population fluctuations due to tourism. Using a machine learning approach, we develop a model that predicts ED visits based solely on these exogenous variables. We test different machine learning algorithms (random forests, support vector machines, and feedforward neural networks) with different combinations of input variables and compare their symmetric mean average percentage errors (SMAPEs). Our findings reveal that calendar information, resident, and tourist population data are statistically significant for the accuracy of the predictions, while the addition of weather data does not provide any further improvement. Comparison of non-time-series with time-series prediction models reveals that the latter provide better accuracy for short prediction horizons (e.g. shorter than a week). Furthermore, time-series models become less or equally accurate to models relying only on exogenous variables for long prediction horizons (e.g. fortnight or month). Our study highlights the importance of carefully selecting predictive variables to ensure short- and long-term, robust and reliable forecasts. This demonstrates that, despite their lower complexity, non-time-series models with well-chosen input variables can be as effective as time-series models when predicting for long time horizons.
