Differentially Private Distributed Inference

Marios Papachristou; M. Amin Rahimian

Differentially Private Distributed Inference

Marios Papachristou, M. Amin Rahimian

TL;DR

The paper tackles privacy-preserving distributed inference in networks of agents by marrying differential privacy with non-Bayesian log-linear belief updates over discrete state spaces. It develops two main problem settings—distributed MLE with a finite signal set and online learning from intermittent streams—and proposes three DP algorithms: AM, GM, and a two-threshold method, all supported by nonasymptotic guarantees and rigorous privacy analysis via the Laplace mechanism. Theoretical results quantify the privacy-utility-cost tradeoffs, including error bounds and communication complexity that scale with the DP budget $\varepsilon$, network properties, and signal statistics. Empirical validation on real multicenter clinical trial data (ACTG and cancer datasets) shows privacy-preserving distributed inference with substantially faster runtimes than homomorphic-encryption approaches and competitive accuracy compared to first-order DP methods, highlighting practical applicability for privacy-aware, multicenter survival analysis. Overall, the work provides a principled framework for privacy-conscious collaboration in healthcare and related domains, enabling scalable, provably private distributed decision-making with concrete guidance on design choices and tradeoffs.

Abstract

How can agents exchange information to learn while protecting privacy? Healthcare centers collaborating on clinical trials must balance knowledge sharing with safeguarding sensitive patient data. We address this challenge by using differential privacy (DP) to control information leakage. Agents update belief statistics via log-linear rules, and DP noise provides plausible deniability and rigorous performance guarantees. We study two settings: distributed maximum likelihood estimation (MLE) with a finite set of private signals and online learning from an intermittent signal stream. Noisy aggregation introduces trade-offs between rejecting low-quality states and accepting high-quality ones. The MLE setting naturally applies to hypothesis testing with formal statistical guarantees. Through simulations, we demonstrate differentially private, distributed survival analysis on real-world clinical trial data, evaluating treatment efficacy and the impact of biomedical indices on patient survival. Our methods enable privacy-preserving inference with greater efficiency and lower error rates than homomorphic encryption and first-order DP optimization approaches.

Differentially Private Distributed Inference

TL;DR

, network properties, and signal statistics. Empirical validation on real multicenter clinical trial data (ACTG and cancer datasets) shows privacy-preserving distributed inference with substantially faster runtimes than homomorphic-encryption approaches and competitive accuracy compared to first-order DP methods, highlighting practical applicability for privacy-aware, multicenter survival analysis. Overall, the work provides a principled framework for privacy-conscious collaboration in healthcare and related domains, enabling scalable, provably private distributed decision-making with concrete guidance on design choices and tradeoffs.

Abstract

Paper Structure (43 sections, 10 theorems, 76 equations, 12 figures, 1 table, 5 algorithms)

This paper contains 43 sections, 10 theorems, 76 equations, 12 figures, 1 table, 5 algorithms.

Introduction
Main Contributions
The Dilemma of Privacy-Preserving Data Sharing
Survival Analysis for Multicenter Clinical Trials
Belief Prorogation for Differentially Private Distributed Inference
Problem Formulation: Distributed Inference & Learning in Discrete Spaces
Differentially Private Log-Linear Belief Updates and Opinion Pools
Performance Analysis Framework
Performance and Privacy Guarantees for Distributed Inference
Differentially Private Distributed MLE
Differentially Private Distributed Hypothesis Testing
Differentially Private Distributed Online Learning from Intermittent Streams
Simulation Study
Discussion
Additional Related Work
...and 28 more sections

Key Result

theorem 1

For alg:private_distributed_mle, with state-independent noise distributions $\mathcal{D}_{i} (\hat{\theta}; {\varepsilon}) = \mathcal{D}_i({\varepsilon})$ that satisfy ${\varepsilon}$-DP and do not depend on the state $\hat{\theta}$; as $\varrho^\mathrm{AM} \to \infty$ and $\varrho^\mathrm{GM} \to \

Figures (12)

Figure 1: When inferring a binary state from a binary signal that agrees with the state with probability $p$, there is a critical privacy budget, ${\varepsilon}^*_\mathrm{RR}$, above which sharing noisy (DP-protected) data becomes beneficial. Left: Statistical power ($\beta_{\mathrm{RR}}$) as a function of privacy budget ${\varepsilon}$ for different number of agents ($n$) with $p=0.7$. The critical budget ${\varepsilon}_{\mathrm{RR}}^*$ corresponds to the intersection of each curve with the dotted line ($\beta_{\mathrm{IND}}$) showing the statical power for a single agent. The intersection points show ${\varepsilon}_{\mathrm{RR}}^*$ decreases with increasing number of agents. Middle: Statistical power ($\beta_{\mathrm{RR}}$) as a function of privacy budget ${\varepsilon}$ for different values of $p$ with $n=100$ agents. The critical budget (${\varepsilon}^*_\mathrm{RR}$) corresponds to the intersection points of the power curve for collective hypothesis testing ($\beta_{\mathrm{RR}}$) with the dotted lines showing statistical power without data sharing for single agents ($\beta_{\mathrm{IND}}$). Increasing $p$ increases the statical power for both individual agents and collectively when agents share their data; therefore, variation of ${\varepsilon}^*_\mathrm{RR}$ may not be much or monotone. However for very high values pf $p$ where individuals can perform highly accurate tests on their own, the value of ${\varepsilon}^*_\mathrm{RR}$ increases. Right: The critical budget (${\varepsilon}^*_\mathrm{RR}$) values for varying $n$ and $p$. We use $\alpha = 0.05$ for all tests.
Figure 2: Statistical power versus privacy budget for testing the probability of private Bernoulli random variables. The intersections of the dotted lines ($\beta_{\mathrm{IND}}$) and the curves ($\beta_{\mathrm{Laplace}}$) give the critical privacy budget, ${\varepsilon}^*_\mathrm{Laplace}$, above which collective testing using noisy shared data is more powerful than individual tests using private signals. All tests are performed at significance level $\alpha = 0.05$. Left: Statistical power with different number of agents ($n$) and $p = 0.7$. The critical privacy budget ${\varepsilon}^*_\mathrm{Laplace}$ decreases with the increasing number of agents $n$. Middle: Statistical power with different values of $p$ and $n = 100$ agents. ${\varepsilon}^*_\mathrm{Laplace}$ as a function of $p$. The statistical power for both individual and collective tests increase with $p$, and therefore the intersection points does not vary monotonically with increasing $p$; however, ${\varepsilon}^*_\mathrm{Laplace}$ is highest at high values of $p$ where individuals alone can perform highly accurate tests. Right: The critical privacy budget ${\varepsilon}^*_\mathrm{Laplace}$ for different values of $n$ and $p$. The rejection criteria and statistical power are numerically determined by drawing 10,000 samples under the null and alternative ($\theta \in \{ 0,1 \}$).
Figure 3: Left: Kaplan-Meier survival curves for ACTG study 175 atcg_study175 for the ZDV and ddI treatments. ZDV stands for Zidovudine, and ddI stands for Didanosine. Middle: Survival curves for one hospital (the data is split equally among five hospitals) for the same study. Right: Log hazard ratios with 95% confidence intervals from the fitted proportional hazards model using all the data (centralized) and one hospital.
Figure 4: Left: Centralized curves. Middle: Curves for one hospital. Data is split evenly among $n = 5$ hospitals. Right: Log Hazard Ratios for each of the treatments for centralized and for one hospital fitted on the data. 95% confidence intervals are reported. The data is split uniformly across centers.
Figure 5: Top: Resulting beliefs (at terminal time $T$) for distributed MLE for AM, GM, and the Two-Threshold algorithm (recovery is performed with one threshold). Bottom: Total variation distance between the results of each algorithm and the non-DP baseline. The privacy budget is set to be ${\varepsilon} = 1$, and the errors to be $\alpha = 1 - \beta = 0.05$. The thresholds are set to $0.25$ (in the belief space). The network corresponds to a network of $n = 5$ fully connected centers. All algorithms recover the best treatment (ZDV+ddI; see also \ref{['fig:best_treatment_survival_curves']}).
...and 7 more figures

Theorems & Definitions (11)

definition 1
theorem 1
theorem 2
theorem 3
corollary 1: Exact Recovery with a Single Threshold
theorem 4
proposition 1: Simple Hypothesis Testing
theorem 5
lemma 1
lemma 2
...and 1 more

Differentially Private Distributed Inference

TL;DR

Abstract

Differentially Private Distributed Inference

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (11)