Learning relationships in epidemiological data using graph neural networks

Anthony J Wood; Aeron R Sanchez; Rowland R Kao

Learning relationships in epidemiological data using graph neural networks

Anthony J Wood, Aeron R Sanchez, Rowland R Kao

Abstract

When designing control strategies for an infectious disease it is critical to identify the key pathways of transmission. Data on infected hosts - when they were born, where they lived and with whom they interacted - can help infer sources of infection and transmission clusters. However such data are generally not powerful enough to identify infector-infectee pairs with any certainty. Whole-genome sequencing data of the underlying pathogen, on the other hand, can serve as a powerful adjoint to these data as they can be used to estimate a time to a most recent common ancestor between two infected hosts. and in turn their relative proximity in the transmission tree. A statistical model that explains the genetic distance between different host pathogens and associated risk factors can therefore inform key risk factors for transmission itself. We show how graph neural networks (GNNs) are a powerful and natural modelling architecture for such a problem. By treating the epidemiological dataset as a graph where infected hosts are nodes and edges are weighted by the genetic distance between different host pairs, we show how a GNN can be fit to predict the genetic distance between known hosts and new, unsequenced hosts. Comparisons with other established approaches show that GNNs have useful performance advantages albeit with greater computational cost.

Learning relationships in epidemiological data using graph neural networks

Abstract

Learning relationships in epidemiological data using graph neural networks

Abstract

Paper Structure

Table of Contents

Figures (12)