Table of Contents
Fetching ...

Genomic-Informed Heterogeneous Graph Learning for Spatiotemporal Avian Influenza Outbreak Forecasting

Jing Du, Haley Stone, Yang Yang, Ashna Desai, Hao Xue, Andreas Züfle, Chandini Raina MacIntyre, Flora D. Salim

TL;DR

This work tackles the complex problem of forecasting avian influenza outbreaks by integrating genomic and environmental context into a multi-layer graph framework. BLUE constructs a bi-layer heterogeneous graph with cross-layer smoothing, then fuses it into a spectral-aligned Fusion Graph, and uses an autoregressive encoder–decoder to generate multi-step forecasts. The approach delivers strong predictive performance and robust outbreak detection on the Avian-US dataset, aided by theoretical guarantees on spectral preservation and a data-driven fusion mechanism. The release of Avian-US supports further research in genomics-informed epidemiology and multi-view graph learning for disease surveillance.

Abstract

Accurate forecasting of Avian Influenza Virus (AIV) outbreaks within wild bird populations necessitates models that account for complex, multi-scale transmission patterns driven by diverse factors. While conventional spatiotemporal epidemic models are robust for human-centric diseases, they rely on spatial homophily and diffusive transmission between geographic regions. This simplification is incomplete for AIV as it neglects valuable genomic information critical for capturing dynamics like high-frequency reassortment and lineage turnover at the case level (e.g., genetic descent across regions), which are essential for understanding AIV spread. To address these limitations, we systematically formulate the AIV forecasting problem and propose a Bi-Layer genomic-aware heterogeneous graph fusion pipeline. This pipeline integrates genetic, spatial, and ecological data to achieve highly accurate outbreak forecasting. It 1) defines a multi-layered graph structure incorporating information from diverse sources and multiple layers (case and location), 2) applies cross-relation smoothing to smooth information flow across edge types, 3) performs graph fusion that preserves critical structural patterns backed by theoretical spectral guarantees, and 4) forecasts future outbreaks using an autoregressive graph sequence model to capture transmission dynamics. To support research, we release the Avian-US dataset, which provides comprehensive genetic, spatial, and ecological data on US avian influenza outbreaks. BLUE demonstrates superior performance over existing baselines, highlighting the efficacy of integrating multi-layer information for infectious disease forecasting. The code is available at: https://github.com/jingdu-cs/BLUE.

Genomic-Informed Heterogeneous Graph Learning for Spatiotemporal Avian Influenza Outbreak Forecasting

TL;DR

This work tackles the complex problem of forecasting avian influenza outbreaks by integrating genomic and environmental context into a multi-layer graph framework. BLUE constructs a bi-layer heterogeneous graph with cross-layer smoothing, then fuses it into a spectral-aligned Fusion Graph, and uses an autoregressive encoder–decoder to generate multi-step forecasts. The approach delivers strong predictive performance and robust outbreak detection on the Avian-US dataset, aided by theoretical guarantees on spectral preservation and a data-driven fusion mechanism. The release of Avian-US supports further research in genomics-informed epidemiology and multi-view graph learning for disease surveillance.

Abstract

Accurate forecasting of Avian Influenza Virus (AIV) outbreaks within wild bird populations necessitates models that account for complex, multi-scale transmission patterns driven by diverse factors. While conventional spatiotemporal epidemic models are robust for human-centric diseases, they rely on spatial homophily and diffusive transmission between geographic regions. This simplification is incomplete for AIV as it neglects valuable genomic information critical for capturing dynamics like high-frequency reassortment and lineage turnover at the case level (e.g., genetic descent across regions), which are essential for understanding AIV spread. To address these limitations, we systematically formulate the AIV forecasting problem and propose a Bi-Layer genomic-aware heterogeneous graph fusion pipeline. This pipeline integrates genetic, spatial, and ecological data to achieve highly accurate outbreak forecasting. It 1) defines a multi-layered graph structure incorporating information from diverse sources and multiple layers (case and location), 2) applies cross-relation smoothing to smooth information flow across edge types, 3) performs graph fusion that preserves critical structural patterns backed by theoretical spectral guarantees, and 4) forecasts future outbreaks using an autoregressive graph sequence model to capture transmission dynamics. To support research, we release the Avian-US dataset, which provides comprehensive genetic, spatial, and ecological data on US avian influenza outbreaks. BLUE demonstrates superior performance over existing baselines, highlighting the efficacy of integrating multi-layer information for infectious disease forecasting. The code is available at: https://github.com/jingdu-cs/BLUE.

Paper Structure

This paper contains 32 sections, 1 theorem, 15 equations, 3 figures, 5 tables.

Key Result

Theorem 3.1

Assuming the spectral approximation error is bounded by $\| \mathbf{L}_{\text{hetero}} - \tilde{\mathbf{L}}_f \| \leq \varepsilon$, for any polynomial filter $p(\cdot)$ and feature matrix $\mathbf{H}$, the difference between applying the heterogeneous and projected fusion operators is bounded:

Figures (3)

  • Figure 1: BLUE consists of 4 components: Bi-layer Heterogeneous Graph Construction models AIV spread using a bi-layer heterogeneous graph with two types of nodes (location and case) and three types of edges (spatial, genetic, and assignment). Then, the MRF-inspired Cross-layer Smoothing block aggregates neighbor information to create coherent representations for heterogeneous nodes and their connections. Graphs are then fused into Information-Preserving Fusion Graphs that preserve the original transmission structure using a spectral regularizer. Finally, Autoregressive Encoder–Decoder Forecasting encodes node interactions over time to generate multi-step forecasts.
  • Figure 2: Per-step performance on Avian-US with $H=4$ (up) and $H=8$ (down).
  • Figure 3: Impact of spectral alignment weight $\lambda_1$.

Theorems & Definitions (1)

  • Theorem 3.1