Table of Contents
Fetching ...

Differentially Private One-Shot Federated Inference for Linear Mixed Models via Lossless Likelihood Reconstruction

Keisuke Hanada, Toshio Shimokawa, Kazushi Maruo

Abstract

One-shot federated learning enables multi-site inference with minimal communication. However, sharing summary statistics can still leak sensitive individual-level information when sites have only a small number of patients. In particular, shared cross-product summaries can reveal patient-level covariate patterns under discrete covariates. Motivated by this concern, this study proposes a differentially private one-shot federated inference framework for linear mixed models with a random-intercept working covariance. The method reconstructs the pooled likelihood from site-level summary statistics and applies a Gaussian mechanism to perturb these summaries, ensuring a site-level differential privacy. Cluster-robust variance estimators are developed that are computed directly from the privatized summaries. Robust variance provides valid uncertainty quantification even under covariance mis-specification. Under a multi-site asymptotic regime, the consistency and asymptotic normality of the proposed estimator are established and the leading-order statistical cost of privacy is characterized. Simulation studies show that moderate privacy noise substantially reduces reconstruction risk while maintaining competitive estimation accuracy as the number of sites increases. However, very strong privacy settings can lead to unstable standard errors when the number of sites is limited. An application using multi-site COVID-19 testing data demonstrates that meaningful privacy protection can be achieved with a modest loss of efficiency.

Differentially Private One-Shot Federated Inference for Linear Mixed Models via Lossless Likelihood Reconstruction

Abstract

One-shot federated learning enables multi-site inference with minimal communication. However, sharing summary statistics can still leak sensitive individual-level information when sites have only a small number of patients. In particular, shared cross-product summaries can reveal patient-level covariate patterns under discrete covariates. Motivated by this concern, this study proposes a differentially private one-shot federated inference framework for linear mixed models with a random-intercept working covariance. The method reconstructs the pooled likelihood from site-level summary statistics and applies a Gaussian mechanism to perturb these summaries, ensuring a site-level differential privacy. Cluster-robust variance estimators are developed that are computed directly from the privatized summaries. Robust variance provides valid uncertainty quantification even under covariance mis-specification. Under a multi-site asymptotic regime, the consistency and asymptotic normality of the proposed estimator are established and the leading-order statistical cost of privacy is characterized. Simulation studies show that moderate privacy noise substantially reduces reconstruction risk while maintaining competitive estimation accuracy as the number of sites increases. However, very strong privacy settings can lead to unstable standard errors when the number of sites is limited. An application using multi-site COVID-19 testing data demonstrates that meaningful privacy protection can be achieved with a modest loss of efficiency.

Paper Structure

This paper contains 33 sections, 8 theorems, 75 equations, 6 figures, 3 tables.

Key Result

Theorem 1

Under the LMM eq-model, the collection of quadratic summaries $\{{\text{\boldmath $S$}}_k, {\text{\boldmath $T$}}_k\}_{k=1}^K$ permits the exact reconstruction of the pooled ML and REML functions. In particular, both $l_{ML}({\text{\boldmath $\beta$}}, \sigma^2, \tau^2)$ and $l_{REML}({\text{\boldma where the components ${\text{\boldmath $W$}}_k$, ${\text{\boldmath $Q$}}_k$, and ${\text{\boldmath

Figures (6)

  • Figure 1: Matrix- and element-level reconstruction rates under various privacy settings. Ref corresponds to the case without Gaussian perturbation and DP(x) corresponds to the case of Gaussian perturbation with $\varepsilon_0=x$.
  • Figure 2: Estimation performance results. Triangles indicate median SE inflation factors exceeding the plotting range; inter-quantile range (IQRs) are omitted in those cases.
  • Figure 3: Results of the real data analysis.
  • Figure 4: Overview of simulation results under the correctly specified random-intercept and slope model. Triangles indicate median SE inflation factors exceeding the plotting range; IQRs are omitted in those cases.
  • Figure 5: Overview of simulation results under the the misspecified random-intercept model.
  • ...and 1 more figures

Theorems & Definitions (14)

  • Theorem 1: Likelihood reconstruction from site-level summaries
  • Definition 1: Differential privacy; Definition 2.4 in dwork2014algorithmic
  • Proposition 1: Gaussian mechanism
  • Theorem 2
  • Theorem 3
  • Proposition 2
  • proof : Proof of Theorem \ref{['th-consistent-non-dp']}
  • Lemma 1: Gaussian mechanism; Theorem 1 in balle2018improving
  • proof : Proof of Proposition \ref{['th-gaussian-mechanism-dp']}
  • Lemma 2: Theorem 5.7 in van2000asymptotic
  • ...and 4 more