DeepTrace: Learning to Optimize Contact Tracing in Epidemic Networks with Graph Neural Networks
Chee Wei Tan, Pei-Duo Yu, Siya Chen, H. Vincent Poor
TL;DR
This work tackles the challenge of identifying contagion sources in rapidly spreading epidemic networks, where forward and backward tracing must be jointly optimized. It reframes tracing as online graph exploration and solves the resulting maximum-likelihood problem with a Graph Neural Network framework called DeepTrace, which learns to optimize the ML estimator as data accrues. The method employs a two-phase training regime (pre-training on synthetic networks with approximate labels and fine-tuning on exact likelihoods) and uses BFS/DFS-driven subgraphs to guide learning, leveraging GraphSAGE with an LSTM aggregator and node features such as Infected_proportion $\hat{r}(v)$ and Boundary_distance_ratio $\check{r}(v)$. Across synthetic networks and real COVID-19 data from Taiwan and Hong Kong, DeepTrace outperforms state-of-the-art baselines in identifying superspreaders and scales to larger graphs, offering a practical, data-driven approach for scalable digital contact tracing in epidemic surveillance.
Abstract
Digital contact tracing aims to curb epidemics by identifying and mitigating public health emergencies through technology. Backward contact tracing, which tracks the sources of infection, proved crucial in places like Japan for identifying COVID-19 infections from superspreading events. This paper presents a novel perspective of digital contact tracing as online graph exploration and addresses the forward and backward contact tracing problem as a maximum-likelihood (ML) estimation problem using iterative epidemic network data sampling. The challenge lies in the combinatorial complexity and rapid spread of infections. We introduce DeepTrace, an algorithm based on a Graph Neural Network (GNN) that iteratively updates its estimations as new contact tracing data is collected, learning to optimize the maximum likelihood estimation by utilizing topological features to accelerate learning and improve convergence. The contact tracing process combines either BFS or DFS to expand the network and trace the infection source, ensuring comprehensive and efficient exploration. Additionally, the GNN model is fine-tuned through a two-phase approach: pre-training with synthetic networks to approximate likelihood probabilities and fine-tuning with high-quality data to refine the model. Using COVID-19 variant data, we illustrate that DeepTrace surpasses current methods in identifying superspreaders, providing a robust basis for a scalable digital contact tracing strategy.
