Table of Contents
Fetching ...

Optimal Cox regression under federated differential privacy: coefficients and cumulative hazards

Elly K. H. Hung, Yi Yu

Abstract

We study two foundational problems in distributed survival analysis under federated differential privacy (FDP): estimation of the Cox regression coefficients and of the cumulative baseline hazard functions, allowing for heterogeneous per-sever sample sizes and privacy budgets. To quantify the fundamental cost of privacy, we derive minimax lower bounds together with upper bounds that match up to poly-logarithmic factors for the regression coefficients, thereby revealing server-level phase transitions between private and non-private regimes. We also consider a relaxed differential privacy framework with partially public information. Our analysis shows that the role of public covariates depends strongly on the privacy model. For cumulative hazard estimation, we propose a private tree-based version of the Breslow estimator for nonparametric integral estimation under FDP. As a by-product, this leads to a private survival function estimator that attains a nearly minimax optimal rate. Numerical experiments, including a real-data application, support the theoretical findings. The proposed methods are implemented in an accompanying R package FDPCox.

Optimal Cox regression under federated differential privacy: coefficients and cumulative hazards

Abstract

We study two foundational problems in distributed survival analysis under federated differential privacy (FDP): estimation of the Cox regression coefficients and of the cumulative baseline hazard functions, allowing for heterogeneous per-sever sample sizes and privacy budgets. To quantify the fundamental cost of privacy, we derive minimax lower bounds together with upper bounds that match up to poly-logarithmic factors for the regression coefficients, thereby revealing server-level phase transitions between private and non-private regimes. We also consider a relaxed differential privacy framework with partially public information. Our analysis shows that the role of public covariates depends strongly on the privacy model. For cumulative hazard estimation, we propose a private tree-based version of the Breslow estimator for nonparametric integral estimation under FDP. As a by-product, this leads to a private survival function estimator that attains a nearly minimax optimal rate. Numerical experiments, including a real-data application, support the theoretical findings. The proposed methods are implemented in an accompanying R package FDPCox.

Paper Structure

This paper contains 42 sections, 41 theorems, 311 equations, 9 figures, 1 table, 5 algorithms.

Key Result

Theorem 1

Denote by $\mathcal{P}$ the class of distributions satisfying Assumptions assump-1 and assp:baseline, and denote $\mathcal{Q}$ the class of $(\{(\epsilon_s, \delta_s)\}_{s\in [S]}, K)$-FDP mechanisms. We have that where the lower bound holds provided that $\delta_s \log(1/\delta_s) \lesssim \epsilon_s^2/d$. $\blacktriangleleft$$\blacktriangleleft$

Figures (9)

  • Figure 1: Simulation results for CDP Cox regression coefficients (panel A) and cumulative hazard (panel B) estimation, with varying sample sizes and privacy budgets; and for CDP Cox regression coefficients estimation with varying dimensions (panel C).
  • Figure 2: Simulations results from \ref{['alg:FDP-SGD']} (panel A) and \ref{['alg:FDP-Breslow']} (panel B), varying the number of servers and the $\epsilon$ privacy budget.
  • Figure 3: Simulations results from \ref{['alg:FDP-SGD-interactive']} (panel A) and \ref{['alg:FDP-Breslow']} (panel B), varying the number of servers and the $\epsilon$ privacy budget.
  • Figure 4: Estimating $\beta_0$ (panel A) and $\Lambda_0$ (panel B) for the Rotterdam breast cancer dataset. The number of servers refers to 1, 2, 3 servers with 994 observations each, and 'CDP' means a single server with all 2982 observations.
  • Figure 5: Sensitivity analysis of \ref{['alg:cdp_sgd']} to privacy noise scaling (panel A) and gradient descent step size (panel B).
  • ...and 4 more figures

Theorems & Definitions (84)

  • Remark 1
  • Definition 1: Central differential privacy, CDP
  • Definition 2: Federated differential privacy, FDP
  • Remark 2
  • Theorem 1
  • Proposition 2
  • Definition 3: Label-CDP
  • Theorem 3
  • Proposition 4
  • Theorem 5
  • ...and 74 more