Leveraging Synthetic and Genetic Data to Improve Epidemic Forecasting

Dave Osthus; Alexander C. Murph; Emma E. Goldberg; Lauren J. Beesley; William M. Fischer; Nidhi K. Parikh; Lauren A. Castro

Leveraging Synthetic and Genetic Data to Improve Epidemic Forecasting

Dave Osthus, Alexander C. Murph, Emma E. Goldberg, Lauren J. Beesley, William M. Fischer, Nidhi K. Parikh, Lauren A. Castro

Abstract

Forecasting infectious disease outbreaks is hard. Forecasting emerging infectious diseases with limited historical data is even harder. In this paper, we investigate ways to improve emerging infectious disease forecasting under operational constraints. Specifically, we explore two options likely to be available near the start of an emerging disease outbreak: synthetic data and genetic information. For this investigation, we conducted an experiment where we trained deep learning models on different combinations of real and synthetic data, both with and without genetic information, to explore how these models compare when forecasting COVID-19 cases for US states. All models are developed with an eye towards forecasting the next pandemic. We find that models trained with synthetic data have better forecast accuracy than models trained on real data alone, and models that use genetic variants have better forecast accuracy compared to those that do not. All models outperformed a baseline persistence model (a feat only accomplished by 7 out of 22 real-time COVID-19 cases forecasting models as reported in [38]) and multiple models outperformed the COVIDHub-4_week_ensemble. This paper demonstrates the value of these underutilized sources of information and provides a blueprint for forecasting future pandemics.

Leveraging Synthetic and Genetic Data to Improve Epidemic Forecasting

Abstract

Leveraging Synthetic and Genetic Data to Improve Epidemic Forecasting

Abstract

Paper Structure

Table of Contents

Figures (19)