Seeding with Differentially Private Network Information
Yuxin Liu, M. Amin Rahimian, Fang-Yi Yu
TL;DR
The paper tackles the problem of selecting influential seeds for public health interventions when only privacy-protected influence samples, rather than full networks, are available. It introduces Influence Sample Differential Privacy (ISDP), a privacy notion that constrains how much the algorithm’s output can reveal about an individual's participation in diffusion cascades. Two DP seeding algorithms are developed under ISDP: a central-DP approach using the exponential mechanism and a local-DP approach using randomized response, with formal guarantees showing near-optimal performance under central DP and provable, though more sample-intensive, guarantees under local DP. Theoretical results and simulations on synthetic and ARTnet-based MSM networks demonstrate that privacy-robust seeding degrades gracefully as the privacy budget tightens, with central DP generally achieving better trade-offs than local DP. Overall, the work provides a practical privacy-preserving framework for network-intervention design that remains effective when only partial cascade data are accessible, and it offers accessible code and data references for replication and extension.
Abstract
In public health interventions such as distributing preexposure prophylaxis (PrEP) for HIV prevention, decision makers often use seeding algorithms to identify key individuals who can amplify intervention impact. However, building a complete sexual activity network is typically infeasible due to privacy concerns. Instead, contact tracing can provide influence samples, observed sequences of sexual contacts, without full network reconstruction. This raises two challenges: protecting individual privacy in these samples and adapting seeding algorithms to incomplete data. We study differential privacy guarantees for influence maximization when the input consists of randomly collected cascades. Building on recent advances in costly seeding, we propose privacy-preserving algorithms that introduce randomization in data or outputs and bound the privacy loss of each node. Theoretical analysis and simulations on synthetic and real-world sexual contact data show that performance degrades gracefully as privacy budgets tighten, with central privacy regimes achieving better trade-offs than local ones.
