Table of Contents
Fetching ...

A Machine Learning Framework for Constructing Heterogeneous Contact Networks: Implications for Epidemic Modelling

Luke Murray Kearney, Emma L Davis, Matt J Keeling

Abstract

Capturing the structured mixing within a population is key to the reliable projection of infectious disease dynamics and hence informed control. Both heterogeneity in the number of contacts and age-structured mixing have been repeatedly demonstrated as fundamental, yet are rarely combined. Networks provide a powerful and intuitive method to realise population structure, and simulate infection dynamics. However the explicit measurement of contact networks is not scalable to larger populations. Here, using data from social contact surveys, we develop a generalisable and robust algorithm utilizing machine learning to generate a surrogate population-scale network that preserves both age-structured mixing and heterogeneity of contacts. We simulate the spread of infection across different populations, considering how the epidemic size varies over basic reproduction number ($R_0$) scenarios - mirroring the process of determining public health impact from early epidemic growth. Our approach shows that both age structure and degree heterogeneity substantially reduce the epidemic size. We also demonstrate that these simulations more accurately capture the heterogeneity in secondary cases observed for COVID-19 when transmission is scaled by contact duration, dampening the effect of highly connected ``super-spreaders". By using survey data collected during 2020-2022, these network models also inform about the impacts of control and targeting of public health interventions: quantifying the non-linear reduction in transmission opportunities that occurred during lockdowns, and the ages and contact types most responsible for onward transmission. Our robust methodology therefore allows for the inclusion of the full wealth of data commonly collected by surveys but frequently overlooked to be incorporated into more realistic transmission models of infectious diseases.

A Machine Learning Framework for Constructing Heterogeneous Contact Networks: Implications for Epidemic Modelling

Abstract

Capturing the structured mixing within a population is key to the reliable projection of infectious disease dynamics and hence informed control. Both heterogeneity in the number of contacts and age-structured mixing have been repeatedly demonstrated as fundamental, yet are rarely combined. Networks provide a powerful and intuitive method to realise population structure, and simulate infection dynamics. However the explicit measurement of contact networks is not scalable to larger populations. Here, using data from social contact surveys, we develop a generalisable and robust algorithm utilizing machine learning to generate a surrogate population-scale network that preserves both age-structured mixing and heterogeneity of contacts. We simulate the spread of infection across different populations, considering how the epidemic size varies over basic reproduction number () scenarios - mirroring the process of determining public health impact from early epidemic growth. Our approach shows that both age structure and degree heterogeneity substantially reduce the epidemic size. We also demonstrate that these simulations more accurately capture the heterogeneity in secondary cases observed for COVID-19 when transmission is scaled by contact duration, dampening the effect of highly connected ``super-spreaders". By using survey data collected during 2020-2022, these network models also inform about the impacts of control and targeting of public health interventions: quantifying the non-linear reduction in transmission opportunities that occurred during lockdowns, and the ages and contact types most responsible for onward transmission. Our robust methodology therefore allows for the inclusion of the full wealth of data commonly collected by surveys but frequently overlooked to be incorporated into more realistic transmission models of infectious diseases.
Paper Structure (3 sections, 36 equations, 17 figures, 3 tables)

This paper contains 3 sections, 36 equations, 17 figures, 3 tables.

Figures (17)

  • Figure 1: Steps in the network construction: (a) Data obtained from egocentric snapshots, together with participant and contact characteristics. (b) Fit high dimensional gaussian mixtures, where each dimension represents the number of contacts with an age group for a certain duration. Find the optimal number of Gaussians which minimise the error for the predefined test set. (c) Create an unconnected population structure matching census data, sample stubs for each individual with a desired age group contact and duration. Randomly connect stubs using a stratified configuration approach. (d) Use the synthetic networks for outbreak simulation.
  • Figure 2: Contact matrices representing the mixing between age-groups and highlighting the heterogeneities in the data (grey), the stochastic block model (blue) and the GMM model (green). For the mixing between each pair of age-groups, we sample 100 ego-networks (associated with a respondent of the correct age) and calculate the number of contacts to individuals in the other age-groups. The results are then plotted as a $10\times10$ subgrid to highlight the variability - points are colour-coded on a logarithmic scale (from 0 to 100) due to the extreme heterogeneities that are present.
  • Figure 3: Relationships between $R_0$ and final size for different survey data and different methodologies. The first four panels show final size for each data set against $R_0$ using the SBM and the GMM network with and without duration scaling. The last panel compares the $R_0$ values from CoMix data sets for the GMM model with duration scaling; as expected lockdown networks generate lower $R_0$ values for the same transmission rate $\tau$.
  • Figure 4: Contributions of different age groups and link durations to outbreaks with $R_0=1.5,\ 2.5,\ 3.5$ in each data set. Duration contributions taken as the proportion of cases from generation two caused by each link type. Age-structured contributions are calculated by the normalised leading eigenvector of age-matrix of cases from generation two.
  • Figure 5: Average degree distributions for each age group. The bar triplets refer to the data set, the Gaussian mixture model and the Stochastic block model respectively.
  • ...and 12 more figures