Table of Contents
Fetching ...

DeepTrace: Learning to Optimize Contact Tracing in Epidemic Networks with Graph Neural Networks

Chee Wei Tan, Pei-Duo Yu, Siya Chen, H. Vincent Poor

TL;DR

This work tackles the challenge of identifying contagion sources in rapidly spreading epidemic networks, where forward and backward tracing must be jointly optimized. It reframes tracing as online graph exploration and solves the resulting maximum-likelihood problem with a Graph Neural Network framework called DeepTrace, which learns to optimize the ML estimator as data accrues. The method employs a two-phase training regime (pre-training on synthetic networks with approximate labels and fine-tuning on exact likelihoods) and uses BFS/DFS-driven subgraphs to guide learning, leveraging GraphSAGE with an LSTM aggregator and node features such as Infected_proportion $\hat{r}(v)$ and Boundary_distance_ratio $\check{r}(v)$. Across synthetic networks and real COVID-19 data from Taiwan and Hong Kong, DeepTrace outperforms state-of-the-art baselines in identifying superspreaders and scales to larger graphs, offering a practical, data-driven approach for scalable digital contact tracing in epidemic surveillance.

Abstract

Digital contact tracing aims to curb epidemics by identifying and mitigating public health emergencies through technology. Backward contact tracing, which tracks the sources of infection, proved crucial in places like Japan for identifying COVID-19 infections from superspreading events. This paper presents a novel perspective of digital contact tracing as online graph exploration and addresses the forward and backward contact tracing problem as a maximum-likelihood (ML) estimation problem using iterative epidemic network data sampling. The challenge lies in the combinatorial complexity and rapid spread of infections. We introduce DeepTrace, an algorithm based on a Graph Neural Network (GNN) that iteratively updates its estimations as new contact tracing data is collected, learning to optimize the maximum likelihood estimation by utilizing topological features to accelerate learning and improve convergence. The contact tracing process combines either BFS or DFS to expand the network and trace the infection source, ensuring comprehensive and efficient exploration. Additionally, the GNN model is fine-tuned through a two-phase approach: pre-training with synthetic networks to approximate likelihood probabilities and fine-tuning with high-quality data to refine the model. Using COVID-19 variant data, we illustrate that DeepTrace surpasses current methods in identifying superspreaders, providing a robust basis for a scalable digital contact tracing strategy.

DeepTrace: Learning to Optimize Contact Tracing in Epidemic Networks with Graph Neural Networks

TL;DR

This work tackles the challenge of identifying contagion sources in rapidly spreading epidemic networks, where forward and backward tracing must be jointly optimized. It reframes tracing as online graph exploration and solves the resulting maximum-likelihood problem with a Graph Neural Network framework called DeepTrace, which learns to optimize the ML estimator as data accrues. The method employs a two-phase training regime (pre-training on synthetic networks with approximate labels and fine-tuning on exact likelihoods) and uses BFS/DFS-driven subgraphs to guide learning, leveraging GraphSAGE with an LSTM aggregator and node features such as Infected_proportion and Boundary_distance_ratio . Across synthetic networks and real COVID-19 data from Taiwan and Hong Kong, DeepTrace outperforms state-of-the-art baselines in identifying superspreaders and scales to larger graphs, offering a practical, data-driven approach for scalable digital contact tracing in epidemic surveillance.

Abstract

Digital contact tracing aims to curb epidemics by identifying and mitigating public health emergencies through technology. Backward contact tracing, which tracks the sources of infection, proved crucial in places like Japan for identifying COVID-19 infections from superspreading events. This paper presents a novel perspective of digital contact tracing as online graph exploration and addresses the forward and backward contact tracing problem as a maximum-likelihood (ML) estimation problem using iterative epidemic network data sampling. The challenge lies in the combinatorial complexity and rapid spread of infections. We introduce DeepTrace, an algorithm based on a Graph Neural Network (GNN) that iteratively updates its estimations as new contact tracing data is collected, learning to optimize the maximum likelihood estimation by utilizing topological features to accelerate learning and improve convergence. The contact tracing process combines either BFS or DFS to expand the network and trace the infection source, ensuring comprehensive and efficient exploration. Additionally, the GNN model is fine-tuned through a two-phase approach: pre-training with synthetic networks to approximate likelihood probabilities and fine-tuning with high-quality data to refine the model. Using COVID-19 variant data, we illustrate that DeepTrace surpasses current methods in identifying superspreaders, providing a robust basis for a scalable digital contact tracing strategy.
Paper Structure (34 sections, 4 theorems, 20 equations, 11 figures, 5 tables, 1 algorithm)

This paper contains 34 sections, 4 theorems, 20 equations, 11 figures, 5 tables, 1 algorithm.

Key Result

Theorem 1

If $\mathbb{G}$ is a $d$-regular tree, and there exists a node $\overline{v}$ such that for any two leaf nodes $v_{\textup{leaf}}$ and $u_{\textup{leaf}}$ in $\mathbb{G}_N$, then the trajectory of $v_{n}^*$ is exactly the shortest path from the index case $v^*_1$ to $v^*_N$ in $\mathbb{G}_N$.

Figures (11)

  • Figure 1: Illustration of an epidemic network $\mathbb{G}_9$ with nine infections (shaded nodes) whose numbering indicates the infection order starts from the ground truth, i.e., the real superspreader. The contact tracing network $G_4$ (within a dotted circle) starts from the index case node $v_6$ (blue arrows show the tracing directions) by forward contact tracing. The backward contact tracing is to find the node in $\mathbb{G}_9$ that is most likely to be the superspreader.
  • Figure 2: As the contact tracing network enlarges starting from the index case $v_a$ with BFS traversal, as ordered alphabetically $\{a,b,c,\ldots ,i\}$, the most likely superspreader given by \ref{['eq:est1']} moves closer (in terms of number of hops) to the most-likely superspreader $v_d$ in the epidemic network $\mathbb{G}_N$ (indicated by red arrows).
  • Figure 3: As the contact tracing network enlarges starting from the index case $v_a$ with DFS traversal, as ordered alphabetically $\{a,b,c, \ldots ,f\}$, the most-likely superspreader given by \ref{['eq:est1']} moves closer (in terms of number of hops) to the most-likely superspreader $v_e$ in the epidemic network $\mathbb{G}_N$ (indicated by red arrows).
  • Figure 4: The overall architecture of Algorithm DeepTrace employs a GNN, taking as input several small-scale networks obtained through BFS and DFS methods. Each node in these networks has structural features and a training label representing the permitted permutation probability. Semi-supervised learning is conducted using GraphSage with LSTM aggregators.
  • Figure 5: Illustration of the boundary distance for each node on $G_n$.
  • ...and 6 more figures

Theorems & Definitions (4)

  • Theorem 1
  • Lemma 1
  • Theorem 2
  • Theorem 3