Table of Contents
Fetching ...

Data-Driven Construction of Age-Structured Contact Networks

Luke Murray Kearney, Emma L. Davis, Matt J. Keeling

TL;DR

A generic and robust algorithm is developed to generate an extrapolated network that preserves both age-structured mixing and heterogeneity in the number of contacts, and shows that including both heterogeneity and age-structure reduces both peak height and epidemic size compared to models that ignore heterogeneity.

Abstract

Capturing the structure of a population and characterising contacts within the population are key to reliable projections of infectious disease. Two main elements of population structure -- contact heterogeneity and age -- have been repeatedly demonstrated to be key in infection dynamics, yet are rarely combined. Regarding individuals as nodes and contacts as edges within a network provides a powerful and intuitive method to fully realise this population structure. While there are a few key examples of contact networks being measured explicitly, in general we need to construct the appropriate networks from individual-level data. Here, using data from social contact surveys, we develop a generic and robust algorithm to generate an extrapolated network that preserves both age-structured mixing and heterogeneity in the number of contacts. We then use these networks to simulate the spread of infection through the population, constrained to have a given basic reproduction number ($R_0$) and hence a given early growth rate. Given the over-dominant role that highly connected nodes (`superspreaders') would otherwise play in early dynamics, we scale transmission by the average duration of contacts, providing a better match to surveillance data for numbers of secondary cases. This network-based model shows that, for COVID-like parameters, including both heterogeneity and age-structure reduces both peak height and epidemic size compared to models that ignore heterogeneity. Our robust methodology therefore allows for the inclusion of the full wealth of data commonly collected by surveys but frequently overlooked to be incorporated into more realistic transmission models of infectious diseases.

Data-Driven Construction of Age-Structured Contact Networks

TL;DR

A generic and robust algorithm is developed to generate an extrapolated network that preserves both age-structured mixing and heterogeneity in the number of contacts, and shows that including both heterogeneity and age-structure reduces both peak height and epidemic size compared to models that ignore heterogeneity.

Abstract

Capturing the structure of a population and characterising contacts within the population are key to reliable projections of infectious disease. Two main elements of population structure -- contact heterogeneity and age -- have been repeatedly demonstrated to be key in infection dynamics, yet are rarely combined. Regarding individuals as nodes and contacts as edges within a network provides a powerful and intuitive method to fully realise this population structure. While there are a few key examples of contact networks being measured explicitly, in general we need to construct the appropriate networks from individual-level data. Here, using data from social contact surveys, we develop a generic and robust algorithm to generate an extrapolated network that preserves both age-structured mixing and heterogeneity in the number of contacts. We then use these networks to simulate the spread of infection through the population, constrained to have a given basic reproduction number () and hence a given early growth rate. Given the over-dominant role that highly connected nodes (`superspreaders') would otherwise play in early dynamics, we scale transmission by the average duration of contacts, providing a better match to surveillance data for numbers of secondary cases. This network-based model shows that, for COVID-like parameters, including both heterogeneity and age-structure reduces both peak height and epidemic size compared to models that ignore heterogeneity. Our robust methodology therefore allows for the inclusion of the full wealth of data commonly collected by surveys but frequently overlooked to be incorporated into more realistic transmission models of infectious diseases.

Paper Structure

This paper contains 6 sections, 2 equations, 6 figures.

Table of Contents

  1. Introduction

Figures (6)

  • Figure 1: Example participant ego-networks from CoMix gimma2022changes, showing individual heterogeneity. (a) School student, male 12-17, in lockdown easing period (schools open). (b) School student, male 12-17, during lockdown (schools closed). (c) Nurse, Male 20-29. (d) Mathematician, Female 30-39. (e) General Manager, Female 50-59. The 'work' and 'other' square nodes represent 245 and 62 short and infrequent contacts. (f) Retired, Female 70+. The participant (ego) is the orange central hexagonal node, connected circles represent individual contacts, and squares represent group contacts with a common location, duration and frequency. Node size represents contact duration. Edge length represents the frequency of social interaction with shorter lengths corresponding to longer contacts. Colours represent social settings of encounters (green, home; blue, work; yellow, other).
  • Figure 2: Contact matrices representing the mixing between age-groups and highlighting the heterogeneities in the data (grey), the stochastic block model (blue) and the double Pareto log-Normal model (green). For the mixing between each pair of age-groups, we sample 100 ego-networks (associated with a respondant of the correct age) and calculate the number of contacts to individuals in the other age-group. The results are then plotted as a $10\times10$ subgrid to highlight the variability - points are colour-coded on a logarithmic scale (from 0 to 100) due to the extreme heterogeneities that are present.
  • Figure 3: Mean EMD error value per individual using the network construction methods for each data set, with (small) error bars of three standard deviations. Each network creation method creates a 100,000 node network 100 times, a representative sample of equal size to the data set is then compared to the data using EMD, giving an average error per person. The horizontal dashed line represents the variance in reconstructions of the same data, by calculating the the EMD between two networks constructed using our model.
  • Figure 4: For all respondents with a given number of contacts in the CoMix data sets, the average number of hours spent with each contact is plotted. Line of best fit added for the chosen functional form $\overline{D}(k)$ in Equation \ref{['eq: duration equation']}.
  • Figure 5: (Row 1) The mean final size of each outbreak against the $R_0$ of that simulation, for SBM, dPlN unscaled, dPlN scaled and dPlN scaled without age-structured mixing. 95% credible intervals are included for each model and the black line represents the theoretical final size of a deterministic ODE model. (Row 2) The peak height of the same simulations, with 95% credible intervals. Here the black line represents the theoretical value.
  • ...and 1 more figures