Table of Contents
Fetching ...

Forecasting West Nile Virus with Graph Neural Networks: Harnessing Spatial Dependence in Irregularly Sampled Geospatial Data

Adam Tonks, Trevor Harris, Bo Li, William Brown, Rebecca Smith

TL;DR

This work addresses forecasting West Nile virus (WNV) presence from irregular geospatial trap data by introducing a graph neural network approach that explicitly models spatial dependence. A four-layer GraphSAGE model operates on daily k-nearest neighbor graphs, with node features including past trap positivity and weather variables, enabling predictions of weekly WNV positivity for each trap. Across lead times of 1–5 weeks, the GNN outperforms logistic regression, XGBoost, and fully-connected networks in AUC and Brier scores, while analyses of graph connectivity and data types reveal distinct temporal informativeness: trap data drive short-horizon performance, whereas weather data contribute more at longer horizons. The results demonstrate the practicality of GNN-based spatiotemporal forecasting for vector-borne disease surveillance and motivate real-time deployment and extension to other regions, diseases, and graph constructions.

Abstract

Machine learning methods have seen increased application to geospatial environmental problems, such as precipitation nowcasting, haze forecasting, and crop yield prediction. However, many of the machine learning methods applied to mosquito population and disease forecasting do not inherently take into account the underlying spatial structure of the given data. In our work, we apply a spatially aware graph neural network model consisting of GraphSAGE layers to forecast the presence of West Nile virus in Illinois, to aid mosquito surveillance and abatement efforts within the state. More generally, we show that graph neural networks applied to irregularly sampled geospatial data can exceed the performance of a range of baseline methods including logistic regression, XGBoost, and fully-connected neural networks.

Forecasting West Nile Virus with Graph Neural Networks: Harnessing Spatial Dependence in Irregularly Sampled Geospatial Data

TL;DR

This work addresses forecasting West Nile virus (WNV) presence from irregular geospatial trap data by introducing a graph neural network approach that explicitly models spatial dependence. A four-layer GraphSAGE model operates on daily k-nearest neighbor graphs, with node features including past trap positivity and weather variables, enabling predictions of weekly WNV positivity for each trap. Across lead times of 1–5 weeks, the GNN outperforms logistic regression, XGBoost, and fully-connected networks in AUC and Brier scores, while analyses of graph connectivity and data types reveal distinct temporal informativeness: trap data drive short-horizon performance, whereas weather data contribute more at longer horizons. The results demonstrate the practicality of GNN-based spatiotemporal forecasting for vector-borne disease surveillance and motivate real-time deployment and extension to other regions, diseases, and graph constructions.

Abstract

Machine learning methods have seen increased application to geospatial environmental problems, such as precipitation nowcasting, haze forecasting, and crop yield prediction. However, many of the machine learning methods applied to mosquito population and disease forecasting do not inherently take into account the underlying spatial structure of the given data. In our work, we apply a spatially aware graph neural network model consisting of GraphSAGE layers to forecast the presence of West Nile virus in Illinois, to aid mosquito surveillance and abatement efforts within the state. More generally, we show that graph neural networks applied to irregularly sampled geospatial data can exceed the performance of a range of baseline methods including logistic regression, XGBoost, and fully-connected neural networks.
Paper Structure (20 sections, 1 equation, 6 figures, 1 table)

This paper contains 20 sections, 1 equation, 6 figures, 1 table.

Figures (6)

  • Figure 1: Location of traps tested during the week of Monday July 15 2019 to Sunday July 21 2019 in Illinois, with Chicago metropolitan area inset (negative tests indicated as crosses and positive tests as circles)
  • Figure 2: GNN input graphs created for (a) Tuesday July 16 2019 (b) July 17 (c) July 18 (d) July 19 using $k$-nearest neighbors with $k=5$, zoomed into the Chicago metropolitan area
  • Figure 3: AUC values for GNN ($k=5$) and baseline models at increasing lead times
  • Figure 4: AUC values for GNN models using various input graphs at increasing lead times
  • Figure 5: ROC curve for GNN ($k=5$) model
  • ...and 1 more figures