Table of Contents
Fetching ...

Seeding with Differentially Private Network Information

Yuxin Liu, M. Amin Rahimian, Fang-Yi Yu

TL;DR

The paper tackles the problem of selecting influential seeds for public health interventions when only privacy-protected influence samples, rather than full networks, are available. It introduces Influence Sample Differential Privacy (ISDP), a privacy notion that constrains how much the algorithm’s output can reveal about an individual's participation in diffusion cascades. Two DP seeding algorithms are developed under ISDP: a central-DP approach using the exponential mechanism and a local-DP approach using randomized response, with formal guarantees showing near-optimal performance under central DP and provable, though more sample-intensive, guarantees under local DP. Theoretical results and simulations on synthetic and ARTnet-based MSM networks demonstrate that privacy-robust seeding degrades gracefully as the privacy budget tightens, with central DP generally achieving better trade-offs than local DP. Overall, the work provides a practical privacy-preserving framework for network-intervention design that remains effective when only partial cascade data are accessible, and it offers accessible code and data references for replication and extension.

Abstract

In public health interventions such as distributing preexposure prophylaxis (PrEP) for HIV prevention, decision makers often use seeding algorithms to identify key individuals who can amplify intervention impact. However, building a complete sexual activity network is typically infeasible due to privacy concerns. Instead, contact tracing can provide influence samples, observed sequences of sexual contacts, without full network reconstruction. This raises two challenges: protecting individual privacy in these samples and adapting seeding algorithms to incomplete data. We study differential privacy guarantees for influence maximization when the input consists of randomly collected cascades. Building on recent advances in costly seeding, we propose privacy-preserving algorithms that introduce randomization in data or outputs and bound the privacy loss of each node. Theoretical analysis and simulations on synthetic and real-world sexual contact data show that performance degrades gracefully as privacy budgets tighten, with central privacy regimes achieving better trade-offs than local ones.

Seeding with Differentially Private Network Information

TL;DR

The paper tackles the problem of selecting influential seeds for public health interventions when only privacy-protected influence samples, rather than full networks, are available. It introduces Influence Sample Differential Privacy (ISDP), a privacy notion that constrains how much the algorithm’s output can reveal about an individual's participation in diffusion cascades. Two DP seeding algorithms are developed under ISDP: a central-DP approach using the exponential mechanism and a local-DP approach using randomized response, with formal guarantees showing near-optimal performance under central DP and provable, though more sample-intensive, guarantees under local DP. Theoretical results and simulations on synthetic and ARTnet-based MSM networks demonstrate that privacy-robust seeding degrades gracefully as the privacy budget tightens, with central DP generally achieving better trade-offs than local DP. Overall, the work provides a practical privacy-preserving framework for network-intervention design that remains effective when only partial cascade data are accessible, and it offers accessible code and data references for replication and extension.

Abstract

In public health interventions such as distributing preexposure prophylaxis (PrEP) for HIV prevention, decision makers often use seeding algorithms to identify key individuals who can amplify intervention impact. However, building a complete sexual activity network is typically infeasible due to privacy concerns. Instead, contact tracing can provide influence samples, observed sequences of sexual contacts, without full network reconstruction. This raises two challenges: protecting individual privacy in these samples and adapting seeding algorithms to incomplete data. We study differential privacy guarantees for influence maximization when the input consists of randomly collected cascades. Building on recent advances in costly seeding, we propose privacy-preserving algorithms that introduce randomization in data or outputs and bound the privacy loss of each node. Theoretical analysis and simulations on synthetic and real-world sexual contact data show that performance degrades gracefully as privacy budgets tighten, with central privacy regimes achieving better trade-offs than local ones.
Paper Structure (38 sections, 17 theorems, 33 equations, 2 figures, 1 table, 5 algorithms)

This paper contains 38 sections, 17 theorems, 33 equations, 2 figures, 1 table, 5 algorithms.

Key Result

Proposition 2.1

Given a positive integer $m$, a graph $G = (V,p)$, and sets of nodes $S,T\subseteq V$, if ${\mathbf{x}}$ is the random variable that encodes $m$ influence samples, then $\mathop{\mathrm{\mathbb{E}}}\limits_{ {\mathbf{x}}}[I_{{\mathbf{x}}}(S)] = I_G(S)$, and $\mathop{\mathrm{\mathbb{E}}}\limits_{\mat

Figures (2)

  • Figure 1: Performance of our privacy-preserving seeding algorithms—the exponential-mechanism and randomized-response methods—on the MSM dataset. We report the expected influence $I_G(\cdot)$ across privacy budgets ($\epsilon$) as the number of influence samples ($m$) increases, and compare against two non-private baselines: influence-sample seeding and the deterministic greedy algorithm with complete network information. Each point represents the mean over 1,000 trials obtained from 50 independent collections of $m$ influence samples, each evaluated 20 times. Error bars show $95\%$ confidence intervals (mostly too small to be visible). The expected spread size for each seed set is estimated using an additional 1,000 influence samples.
  • Figure B.1: Performance of our exponential-mechanism and randomized-response algorithms compared with the non-private influence-sample seeding method and the complete-information greedy baseline on an Erdős–Rényi graph with $n=200$ and $p=0.15$. We report the expected influence $I_G(\cdot)$ across privacy budgets ($\epsilon$) as the number of influence samples ($m$) increases. Each point represents the mean over $1000$ trials, obtained from $50$ independent collections of $m$ influence samples, with each collection evaluated $20$ times. Error bars show $95\%$ confidence intervals, and the expected spread is estimated using an additional $1000$ influence samples.

Theorems & Definitions (18)

  • Proposition 2.1: Observation 3.2. of DBLP:journals/corr/abs-1212-0884 -- cf. borgs2014maximizing
  • Theorem 2.2: label = thm:original, name = Theorem 2 of costlyseeding
  • Theorem 2.3: Theorem 3 of costlyseeding
  • Definition 2.4: ISDP
  • Proposition 2.5: Hardness of Influence Maximization under Node DP
  • Proposition 2.6: Hardness of Influence Maximization under Edge DP
  • Theorem 3.1: label = thm:exp:inf
  • Theorem 3.2: label = thm:exp:acc
  • Proposition 3.3: label = prop:feasible, name = Feasibility of Unbiased Post-processing
  • Theorem 3.4: label = thm:rr
  • ...and 8 more