Genomic-Informed Heterogeneous Graph Learning for Spatiotemporal Avian Influenza Outbreak Forecasting
Jing Du, Haley Stone, Yang Yang, Ashna Desai, Hao Xue, Andreas Züfle, Chandini Raina MacIntyre, Flora D. Salim
TL;DR
This work tackles the complex problem of forecasting avian influenza outbreaks by integrating genomic and environmental context into a multi-layer graph framework. BLUE constructs a bi-layer heterogeneous graph with cross-layer smoothing, then fuses it into a spectral-aligned Fusion Graph, and uses an autoregressive encoder–decoder to generate multi-step forecasts. The approach delivers strong predictive performance and robust outbreak detection on the Avian-US dataset, aided by theoretical guarantees on spectral preservation and a data-driven fusion mechanism. The release of Avian-US supports further research in genomics-informed epidemiology and multi-view graph learning for disease surveillance.
Abstract
Accurate forecasting of Avian Influenza Virus (AIV) outbreaks within wild bird populations necessitates models that account for complex, multi-scale transmission patterns driven by diverse factors. While conventional spatiotemporal epidemic models are robust for human-centric diseases, they rely on spatial homophily and diffusive transmission between geographic regions. This simplification is incomplete for AIV as it neglects valuable genomic information critical for capturing dynamics like high-frequency reassortment and lineage turnover at the case level (e.g., genetic descent across regions), which are essential for understanding AIV spread. To address these limitations, we systematically formulate the AIV forecasting problem and propose a Bi-Layer genomic-aware heterogeneous graph fusion pipeline. This pipeline integrates genetic, spatial, and ecological data to achieve highly accurate outbreak forecasting. It 1) defines a multi-layered graph structure incorporating information from diverse sources and multiple layers (case and location), 2) applies cross-relation smoothing to smooth information flow across edge types, 3) performs graph fusion that preserves critical structural patterns backed by theoretical spectral guarantees, and 4) forecasts future outbreaks using an autoregressive graph sequence model to capture transmission dynamics. To support research, we release the Avian-US dataset, which provides comprehensive genetic, spatial, and ecological data on US avian influenza outbreaks. BLUE demonstrates superior performance over existing baselines, highlighting the efficacy of integrating multi-layer information for infectious disease forecasting. The code is available at: https://github.com/jingdu-cs/BLUE.
