Table of Contents
Fetching ...

Tensor product algorithms for inference of contact network from epidemiological data

Sergey Dolgov, Dmitry Savostyanov

TL;DR

This work tackles inferring a contact network from time-resolved epidemic data by casting network discovery as a black-box Bayesian optimization over the network set ${\mathbb{G}}$. The forward model is the ε-SIS dynamics on a graph, governed by a chemical master equation on the state space ${\mathbb{X}}^N$, which is solved efficiently using tensor-train (TT) representations and a CP form of the CME operator to circumvent the curse of dimensionality. The authors introduce a data-driven initialization, Fiedler-vector–based node ordering to reduce TT ranks, and tempered Metropolis–Hastings schemes (MCMC-R and MCMC-noR) to robustly identify the most probable network, achieving accurate reconstruction on several networks (linear chain, Austria road, Florentine families, and small-world). The approach demonstrates that TT-based CME solvers can recover rare-event likelihoods essential for reliable network inference, with practical implications for analyzing epidemiological data and reconstructing contact structures at nontrivial scales.

Abstract

We consider a problem of inferring contact network from nodal states observed during an epidemiological process. In a black--box Bayesian optimisation framework this problem reduces to a discrete likelihood optimisation over the set of possible networks. The cardinality of this set grows combinatorially with the number of network nodes, which makes this optimisation computationally challenging. For each network, its likelihood is the probability for the observed data to appear during the evolution of the epidemiological process on this network. This probability can be very small, particularly if the network is significantly different from the ground truth network, from which the observed data actually appear. A commonly used stochastic simulation algorithm struggles to recover rare events and hence to estimate small probabilities and likelihoods. In this paper we replace the stochastic simulation with solving the chemical master equation for the probabilities of all network states. Since this equation also suffers from the curse of dimensionality, we apply tensor train approximations to overcome it and enable fast and accurate computations. Numerical simulations demonstrate efficient black--box Bayesian inference of the network.

Tensor product algorithms for inference of contact network from epidemiological data

TL;DR

This work tackles inferring a contact network from time-resolved epidemic data by casting network discovery as a black-box Bayesian optimization over the network set . The forward model is the ε-SIS dynamics on a graph, governed by a chemical master equation on the state space , which is solved efficiently using tensor-train (TT) representations and a CP form of the CME operator to circumvent the curse of dimensionality. The authors introduce a data-driven initialization, Fiedler-vector–based node ordering to reduce TT ranks, and tempered Metropolis–Hastings schemes (MCMC-R and MCMC-noR) to robustly identify the most probable network, achieving accurate reconstruction on several networks (linear chain, Austria road, Florentine families, and small-world). The approach demonstrates that TT-based CME solvers can recover rare-event likelihoods essential for reliable network inference, with practical implications for analyzing epidemiological data and reconstructing contact structures at nontrivial scales.

Abstract

We consider a problem of inferring contact network from nodal states observed during an epidemiological process. In a black--box Bayesian optimisation framework this problem reduces to a discrete likelihood optimisation over the set of possible networks. The cardinality of this set grows combinatorially with the number of network nodes, which makes this optimisation computationally challenging. For each network, its likelihood is the probability for the observed data to appear during the evolution of the epidemiological process on this network. This probability can be very small, particularly if the network is significantly different from the ground truth network, from which the observed data actually appear. A commonly used stochastic simulation algorithm struggles to recover rare events and hence to estimate small probabilities and likelihoods. In this paper we replace the stochastic simulation with solving the chemical master equation for the probabilities of all network states. Since this equation also suffers from the curse of dimensionality, we apply tensor train approximations to overcome it and enable fast and accurate computations. Numerical simulations demonstrate efficient black--box Bayesian inference of the network.
Paper Structure (18 sections, 21 equations, 8 figures, 1 algorithm)

This paper contains 18 sections, 21 equations, 8 figures, 1 algorithm.

Figures (8)

  • Figure 1: Network inference workflow. An MCMC algorithm samples proposal network configurations, $\mathcal{G}$. The CME is solved on each time interval $[t_{k-1},t_k]$ in the observed data, starting from the state observation $X(t_{k-1})=\mathrm{x}_{k-1}$ and obtaining the TT approximation of the probability of observing the state $X(t_{k})=\mathrm{x}_{k}$ for the given network $\mathcal{G}.$ The CME solver is applied in place of a more commonly used SSA method, that struggles to recover rare events. The probabilities for all data are multiplied to form the likelihood $\mathsf{L}(\mathcal{G})$, which is accepted or rejected in the MCMC. Finally, the network with the maximum likelihood among the MCMC samples is inferred.
  • Figure 2: Markov transitions between network states: (a) ${\varepsilon}$--SIS epidemic on a chain of $N=3$ people; (b) ${\varepsilon}$--SIS epidemic in a fully connected network of $N=3$ people. On the graph, green arrows denote recovery process with rate $\gamma,$ and red arrows with a circled number $k$ denote infection process with rate $k\beta+\varepsilon.$
  • Figure 3: The set of all possible networks with $N=3$ nodes is a binary hypercube in dimension $\tfrac{1}{2} N(N-1)=3.$
  • Figure 4: The tensor product structure of recovery transitions $[ p^\text{(rec)}_{\mathrm{x}\to(\mathrm{x}-\mathrm{e}_n)} ]_{\mathrm{x}\in\mathbb{X}^N}$ is illustrated for population of $N=3$ people. Recovery takes place on individual nodes and hence does not depend on contact network. In each panel, highlighted states $\mathrm{x}$ are where $p^\text{(rec)}_{\mathrm{x}\to(\mathrm{x}-\mathrm{e}_n)} = \gamma$, indicating that person $n$ is infected and can recover; this is also shown by green arrows. Non--highlighted states correspond to $p^\text{(rec)}_{\mathrm{x}\to(\mathrm{x}-\mathrm{e}_n)} = 0$. (a) $n=1,$ (b) $n=2,$ (c) $n=3.$
  • Figure 5: Inferring linear chain network with $N=9$ people from $\varepsilon$--SIS epidemic process with $\beta=1,$$\gamma=0.5$ and $\varepsilon=0.01$: (a) the ground truth network $\mathcal{G}_\star$ in its initial state; (b) the contrast $\log_{10}\mathsf{L}(\mathcal{G})-\log_{10}\mathsf{L}(\mathcal{G}_\star)$ averaged over $N_s=42$ datasets, shown for grids $\mathcal{G}$ that differ from $\mathcal{G}_\star$ by a single link $(m,n)$; axes $\times$ show links in $\mathcal{G}_\star$; (c) the distribution of probabilities for the transitions observed in data for the initial guess network $\mathcal{G}_0$; (d) the distribution of probabilities for the transitions observed in data for the ground truth network $\mathcal{G}_\star$; (e) convergence of likelihood $\mathsf{L}(\mathcal{G})$ towards $\mathsf{L}(\mathcal{G}_\star)$ in the optimisation algorithm MCMC-noR; average (solid lines) $\pm$ one standard deviation (shaded areas) over the $N_s=42$ datasets; shown for temperatures $\tau=1, 10, 100$; (f) convergence of network $\mathcal{G}$ towards $\mathcal{G}_\star.$
  • ...and 3 more figures