Reconstructing Network Outbreaks under Group Surveillance

Ritwick Mishra; Abhijin Adiga; Anil Vullikanti

Reconstructing Network Outbreaks under Group Surveillance

Ritwick Mishra, Abhijin Adiga, Anil Vullikanti

TL;DR

The paper tackles reconstructing disease cascades from pooled test results under group surveillance using the Independent Cascade model. It proves strong hardness results and delivers practical approximations: ApproxCascade via a Group Steiner Tree reduction with a bound of $O(k^{\epsilon})$ and RoundCascade via LP relaxation with a $2+2\ln k$-approximation for the one-hop variant. Extensive experiments on synthetic and real contact networks show significant gains over pool-size-1 baselines in missing infection recovery and prevalence estimation, while highlighting sensitivity to testing noise. The work advances outbreak inference under pooled surveillance and suggests future directions on pool design and robustness to test imperfections.

Abstract

A key public health problem during an outbreak is to reconstruct the disease cascade from a partial set of confirmed infections. This has been studied extensively under the Maximum Likelihood Estimation (MLE) formulation, which reduces the problem to finding some type of Steiner subgraph on a network. Group surveillance like wastewater or aerosol monitoring is a form of mass/pooled testing where samples from multiple individuals are pooled together and tested once for all. While a single negative test clears multiple individuals, a positive test does not reveal the infected individuals in the test pool. We introduce the POOLCASCADEMLE problem in the setting of a network propagation process, where the goal is to find a MLE cascade subgraph which is consistent with the pooled test outcomes. Previous work on reconstruction assumes that the test results are of individuals, i.e., pools of size one, and requires a consistent cascade to connect the positive testing nodes. In POOLCASCADEMLE, a consistent cascade must choose at least one node in each positive pool, adding another combinatorial layer. We show that, under the Independent Cascade (IC) model, POOLCASCADEMLE is NP-hard, and present an approximation algorithm based on a reduction to the Group Steiner Tree problem. We also consider a one-hop version of this problem, in which the disease can spread for one time step after being seeded. We show that even this restricted version is NP-hard, and develop a method using linear programming relaxation and rounding. We evaluate the performance of our methods on real and synthetic contact networks, in terms of missing infection recovery and prevalence estimation. We find that our approach outperforms meaningful baselines which correspond to pools of size one and use state-of-the-art methods.

Reconstructing Network Outbreaks under Group Surveillance

TL;DR

and RoundCascade via LP relaxation with a

-approximation for the one-hop variant. Extensive experiments on synthetic and real contact networks show significant gains over pool-size-1 baselines in missing infection recovery and prevalence estimation, while highlighting sensitivity to testing noise. The work advances outbreak inference under pooled surveillance and suggests future directions on pool design and robustness to test imperfections.

Abstract

Paper Structure (22 sections, 10 theorems, 12 equations, 7 figures, 1 table, 4 algorithms)

This paper contains 22 sections, 10 theorems, 12 equations, 7 figures, 1 table, 4 algorithms.

Introduction
Related work
Preliminaries
Problem formulation: MLE construction for single-seed instance
Problem formulation: MLE construction for the one-hop instance
Hardness results
Our Approach
PoolCascadeMLE problem
One-HopCascadeMLE problem
Experimental Results
Dataset and Methods
Results
Limitations of the MLE cascade approach
Conclusion
Additional details for Section 4
...and 7 more sections

Key Result

Theorem 1

PoolCascadeMLE is hard to approximate within a $O(\log^{2-\epsilon}{k})$ factor, for any $\epsilon>0$ unless P=NP. Here $|\Gamma_1|=k$.

Figures (7)

Figure 1: On the left, a cascade (in red) with root $r$ has resulted in two positive-testing pools (in dashed ovals). On the right, a reconstructed cascade $T_r$ is shown (in purple) which is consistent, i.e., it contains at least one node from each positive pool. Here, $\delta_{T_r} = \{(3,7),(3,8)\},\lambda_{T_r}=\{(2,5)\}$. The probability $P(T_r)= p_{r1}p_{15}p_{12}p_{23}p_{r4}p_{49}(1-p_{37})(1-p_{38})$.
Figure 2: Performance of ApproxCascade against baselines.
Figure 3: Performance comparison in terms of prevalence estimation relative error $e_{rel}$.
Figure 4: Performance of RoundCascade against the baseline.
Figure 5: Ground truth cascade shown in red. PoolCascadeMLE solution shown in green. The testing pool is the dotted circle around the leaf nodes.
...and 2 more figures

Theorems & Definitions (10)

Theorem 1
Theorem 2
Lemma 1
Lemma 2
Lemma 3
Theorem 3
Lemma 4
Lemma 5
Lemma 6
Theorem 4

Reconstructing Network Outbreaks under Group Surveillance

TL;DR

Abstract

Reconstructing Network Outbreaks under Group Surveillance

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (10)