Table of Contents
Fetching ...

Cross-silo Federated Learning with Record-level Personalized Differential Privacy

Junxu Liu, Jian Lou, Li Xiong, Jinfei Liu, Xiaofeng Meng

TL;DR

The paper addresses cross-silo federated learning under record-level personalized differential privacy (PDP) by proposing rPDP-FL, a two-stage Poisson sampling framework that assigns uniform client-level sampling and non-uniform record-level sampling based on individual privacy budgets. It introduces Simulation-CurveFitting (SCF) to infer a practical mapping from per-record privacy budgets to sampling probabilities, discovering that the accumulative privacy cost can be well-approximated by an exponential function of the per-record sampling probability, enabling efficient privacy accounting and budget management. The authors provide a rigorous RDP-based privacy analysis for the two-stage sampling scheme, show enhanced privacy amplification, and demonstrate substantial utility gains over non-personalized baselines in extensive experiments across healthcare and vision-language benchmarks. This work lays foundational methods for record-level privacy personalization in FL and offers a practical, scalable approach to balance privacy with learning performance in real-world, heterogeneous data settings.

Abstract

Federated learning (FL) enhanced by differential privacy has emerged as a popular approach to better safeguard the privacy of client-side data by protecting clients' contributions during the training process. Existing solutions typically assume a uniform privacy budget for all records and provide one-size-fits-all solutions that may not be adequate to meet each record's privacy requirement. In this paper, we explore the uncharted territory of cross-silo FL with record-level personalized differential privacy. We devise a novel framework named \textit{rPDP-FL}, employing a two-stage hybrid sampling scheme with both uniform client-level sampling and non-uniform record-level sampling to accommodate varying privacy requirements. A critical and non-trivial problem is how to determine the ideal per-record sampling probability $q$ given the personalized privacy budget $\varepsilon$. We introduce a versatile solution named \textit{Simulation-CurveFitting}, allowing us to uncover a significant insight into the nonlinear correlation between $q$ and $\varepsilon$ and derive an elegant mathematical model to tackle the problem. Our evaluation demonstrates that our solution can provide significant performance gains over the baselines that do not consider personalized privacy preservation.

Cross-silo Federated Learning with Record-level Personalized Differential Privacy

TL;DR

The paper addresses cross-silo federated learning under record-level personalized differential privacy (PDP) by proposing rPDP-FL, a two-stage Poisson sampling framework that assigns uniform client-level sampling and non-uniform record-level sampling based on individual privacy budgets. It introduces Simulation-CurveFitting (SCF) to infer a practical mapping from per-record privacy budgets to sampling probabilities, discovering that the accumulative privacy cost can be well-approximated by an exponential function of the per-record sampling probability, enabling efficient privacy accounting and budget management. The authors provide a rigorous RDP-based privacy analysis for the two-stage sampling scheme, show enhanced privacy amplification, and demonstrate substantial utility gains over non-personalized baselines in extensive experiments across healthcare and vision-language benchmarks. This work lays foundational methods for record-level privacy personalization in FL and offers a practical, scalable approach to balance privacy with learning performance in real-world, heterogeneous data settings.

Abstract

Federated learning (FL) enhanced by differential privacy has emerged as a popular approach to better safeguard the privacy of client-side data by protecting clients' contributions during the training process. Existing solutions typically assume a uniform privacy budget for all records and provide one-size-fits-all solutions that may not be adequate to meet each record's privacy requirement. In this paper, we explore the uncharted territory of cross-silo FL with record-level personalized differential privacy. We devise a novel framework named \textit{rPDP-FL}, employing a two-stage hybrid sampling scheme with both uniform client-level sampling and non-uniform record-level sampling to accommodate varying privacy requirements. A critical and non-trivial problem is how to determine the ideal per-record sampling probability given the personalized privacy budget . We introduce a versatile solution named \textit{Simulation-CurveFitting}, allowing us to uncover a significant insight into the nonlinear correlation between and and derive an elegant mathematical model to tackle the problem. Our evaluation demonstrates that our solution can provide significant performance gains over the baselines that do not consider personalized privacy preservation.
Paper Structure (25 sections, 8 theorems, 22 equations, 9 figures, 4 tables, 3 algorithms)

This paper contains 25 sections, 8 theorems, 22 equations, 9 figures, 4 tables, 3 algorithms.

Key Result

Lemma 1

If $\mathcal{A}$ is an ($\alpha,\rho$)-RDP mechanism, it also satisfies ($\rho+\frac{\log{1/\delta}}{\alpha-1},\delta$)-DP for any $0<\delta<1$.

Figures (9)

  • Figure 1: An illustration of the cross-silo federated learning with record-level personalized differential privacy. In this framework, each user is given the autonomy to independently opt for a personalized privacy preference (specified by a personalized differential privacy (PDP) budget $\varepsilon$) for their respective records. The goal is to train a private global model that satisfies record-level PDP.
  • Figure 2: The RDP and DP budget curves w.r.t. order $\alpha$ and sampling probability $q$ of a sequential combination of $T$=100 PoiSG mechanisms with noise multiplier $\sigma$=1.0 and $\delta$=1e-3.
  • Figure 3: A step-by-step illustration of the rPDP-FL algorithm.
  • Figure 4: An illustration of Simulation-CurveFitting.
  • Figure 5: The DP budget curves w.r.t. order $\alpha$ (left) and the optimum DP budget w.r.t. sampling probability $q$ (right) of a rPDP-FL algorithm with parameters $T$=20, $\tau$=5, $\lambda$=0.5, $\sigma$=1.0, and $\delta$=1e-3.
  • ...and 4 more figures

Theorems & Definitions (18)

  • Definition 1: ($\varepsilon,\delta$)-Differential Privacy dwork2006calibratingdwork2014algorithmic
  • Definition 2: $(\mathcal{E},\delta)$-Personalized Differential Privacy jorgensen2015conservative
  • Remark 1
  • Definition 3: Poisson Sampling zhu2019poission
  • Definition 4: Poisson-Sampled Gaussian (PoiSG) mechanism
  • Definition 5: ($\alpha,\rho$)-Rényi Differential Privacy mironov2017renyi
  • Lemma 1: Transition from RDP to DP mironov2017renyi
  • Lemma 2: Adaptive sequential composition mironov2017renyi
  • Lemma 3: Post-processing mironov2017renyi
  • Lemma 4: Privacy amplification via (uniform) Poisson sampling for Gaussian mechanism mironov2019renyizhu2019poission
  • ...and 8 more