Table of Contents
Fetching ...

Combining digital data streams and epidemic networks for real time outbreak detection

Ruiqi Lyu, Alistair Turcan, Bryan Wilder

TL;DR

This work applies LRTrend to 2 years of COVID-19 data in 305 hospital referral regions and frequently detect regional Delta and Omicron waves within 2 weeks of the outbreak's start, when case counts are a small fraction of the wave's resulting peak.

Abstract

Responding to disease outbreaks requires close surveillance of their trajectories, but outbreak detection is hindered by the high noise in epidemic time series. Aggregating information across data sources has shown great denoising ability in other fields, but remains underexplored in epidemiology. Here, we present LRTrend, an interpretable machine learning framework to identify outbreaks in real time. LRTrend effectively aggregates diverse health and behavioral data streams within one region and learns disease-specific epidemic networks to aggregate information across regions. We reveal diverse epidemic clusters and connections across the United States that are not well explained by commonly used human mobility networks and may be informative for future public health coordination. We apply LRTrend to 2 years of COVID-19 data in 305 hospital referral regions and frequently detect regional Delta and Omicron waves within 2 weeks of the outbreak's start, when case counts are a small fraction of the wave's resulting peak.

Combining digital data streams and epidemic networks for real time outbreak detection

TL;DR

This work applies LRTrend to 2 years of COVID-19 data in 305 hospital referral regions and frequently detect regional Delta and Omicron waves within 2 weeks of the outbreak's start, when case counts are a small fraction of the wave's resulting peak.

Abstract

Responding to disease outbreaks requires close surveillance of their trajectories, but outbreak detection is hindered by the high noise in epidemic time series. Aggregating information across data sources has shown great denoising ability in other fields, but remains underexplored in epidemiology. Here, we present LRTrend, an interpretable machine learning framework to identify outbreaks in real time. LRTrend effectively aggregates diverse health and behavioral data streams within one region and learns disease-specific epidemic networks to aggregate information across regions. We reveal diverse epidemic clusters and connections across the United States that are not well explained by commonly used human mobility networks and may be informative for future public health coordination. We apply LRTrend to 2 years of COVID-19 data in 305 hospital referral regions and frequently detect regional Delta and Omicron waves within 2 weeks of the outbreak's start, when case counts are a small fraction of the wave's resulting peak.

Paper Structure

This paper contains 48 sections, 14 equations, 12 figures.

Figures (12)

  • Figure 1: Overview of LRTrend. (A) Univariate local regression pipeline. Example real-time case data (blue line) versus ground truth disease prevalence (dashed blue line) and increasing trends (shaded red) is shown first. LRTrend operates on the recent window (black box) to output $p$-values and growth rates for that window. (B) Multivariate local regression pipeline. 4 example data streams are shown for a given window, each with their own local regression results. (C) Geographic aggregation pipeline. Historical growth rates are defined from previous windows for a given data stream. Pittsburgh is the focal region, others are learned neighboring regions.
  • Figure 2: Assessing individual stream performance. (A) Raw COVID-19 data streams CPR admissions, JHU cases, and Change Healthcare Claims. LRTrend's retrospective ground truth is indicated with a solid gray/black line, colored differently for different penalty values. Consensus outbreak regions are annotated in red shaded areas. Alarms are annotated from applying LRTrend with each data stream. (B) LRTrend's power in detecting outbreaks in conjunction with each data stream versus window size used for detection. GT streams are colored red, Medium streams colored blue, and Weak streams colored purple. (C) LRTrend's delay in detecting outbreaks in conjunction with each data stream versus window size used for detection. GT streams are colored red, Medium streams colored blue, and Weak streams colored purple.
  • Figure 3: Multi-stream aggregation. (A–C) Power for LRTrend and Stolerman using 3 GT, 5 Medium, and 4 Weak stream sets, respectively, compared to using LRTrend on each group's strongest individual stream. (D) Power for LRTrend and Stolerman’s optimal stream combination versus using all streams. (E–F) Average window size per state for 80% power when combining GT and Medium streams versus individual streams (dashed line = maximum window size). (G–H) Window size needed for 80% power across HRRs, mapped for doctor visits (G) and combined Medium streams (H).
  • Figure 4: Geographic aggregation (A-C) Change in power versus no aggregation averaged across streams with each aggregation method for GT, Medium, and Weak streams, respectively. (D) Average number of in-state neighbors, averaged across states with 95% confidence intervals. (E) Absolute Spearman correlation in learned epidemic distances between each network. (F) 3-NN epidemic neighbor graph learned using LRTrend with CPR admissions. Regions are colored by cluster, lines represent 3-nearest neighbor connections.
  • Figure 5: Flowchart of multivariate smoothing. Multiple input data sources are input, scaled, weekday corrected, and jointly smoothed to produce a single measure of latent disease burden.
  • ...and 7 more figures