Differentially Private Clustered Federated Learning

Saber Malekmohammadi; Afaf Taik; Golnoosh Farnadi

Differentially Private Clustered Federated Learning

Saber Malekmohammadi, Afaf Taik, Golnoosh Farnadi

TL;DR

This work tackles DP Federated Learning under structured data heterogeneity by introducing 0.90R-DPCFL, a robust clustered FL algorithm that uses both model updates and training losses for clustering. The method starts with a noise-robust, full-batch first round to enable a Gaussian Mixture Model (GMM) to softly cluster clients, then gradually switches to loss-based hard clustering to personalize cluster models, all under differential privacy via Gaussian noise and Exponential Mechanism-based private clustering. Theoretical results show improved separability and faster EM convergence when the initial batch is large, and the MSS/MPO framework guides when to switch strategies and how many clusters to use. Empirical evaluations on MNIST, FMNIST, and CIFAR10 demonstrate improved utility over baselines across privacy budgets, with particular gains for minority clusters, highlighting practical impact in privacy-preserving, heterogeneous FL deployments.

Abstract

Federated learning (FL), which is a decentralized machine learning (ML) approach, often incorporates differential privacy (DP) to provide rigorous data privacy guarantees. Previous works attempted to address high structured data heterogeneity in vanilla FL settings through clustering clients (a.k.a clustered FL), but these methods remain sensitive and prone to errors, further exacerbated by the DP noise. This vulnerability makes the previous methods inappropriate for differentially private FL (DPFL) settings with structured data heterogeneity. To address this gap, we propose an algorithm for differentially private clustered FL, which is robust to the DP noise in the system and identifies the underlying clients' clusters correctly. To this end, we propose to cluster clients based on both their model updates and training loss values. Furthermore, for clustering clients' model updates at the end of the first round, our proposed approach addresses the server's uncertainties by employing large batch sizes as well as Gaussian Mixture Models (GMM) to reduce the impact of DP and stochastic noise and avoid potential clustering errors. This idea is efficient especially in privacy-sensitive scenarios with more DP noise. We provide theoretical analysis to justify our approach and evaluate it across diverse data distributions and privacy budgets. Our experimental results show its effectiveness in addressing large structured data heterogeneity in DPFL.

Differentially Private Clustered Federated Learning

TL;DR

Abstract

Paper Structure (44 sections, 11 theorems, 39 equations, 15 figures, 3 tables)

This paper contains 44 sections, 11 theorems, 39 equations, 15 figures, 3 tables.

Introduction
Related work
Definitions, Notations and assumptions
Methodology and proposed algorithm
0.90R-DPCFL algorithm
Reducing GMM uncertainty via using full batch sizes in the first round and small batch sizes in the subsequent rounds
Effect of batch sizes $\{b_i^1\}_{i=1}^n$ on the separation between clusters
Convergence rate of EM for learning GMM
Applicability of 0.90R-DPCFL
Evaluation
Results
Conclusion
Notations
Vulnerability of existing clustered FL algorithms
Background
...and 29 more sections

Key Result

Lemma 4.0

Let $\approx$ denote approximation. After $K$ local epochs with step size $\eta_l$, client $i$ generates the noisy DP model update $\Delta \Tilde{\boldsymbol \theta }_i^e(b_i^e)$ at the end of the round $e$. The amount of noise in the resulting model update can be found as:

Figures (15)

Figure 1: Left: Considered threat model in this work, where client $i$ has local train data $\mathcal{D}_i$ and "sample-level" DP privacy parameters $(\epsilon, \delta)$, and does not trust any external party. Right: Three main stages of the proposed 0.90R-DPCFL algorithm.
Figure 2: PCA visualization of updates $\{\Delta \Tilde{\mathbf{\boldsymbol \theta }}_i^1\}_{i=1}^n$ on 2D space. Left:$\epsilon_i=10$, $b_i^e=32$ for all $i$ and $e$. Right:$\epsilon_i=10$, $b_i^1=b^1=N=6600$, i.e., full batch sizes (assuming $N_i=N=6600$ for all clients), and $b_i^{>1}=32$ for all $i$. The empty markers show the centers of the Gaussian components. The model updates are obtained from clients running DPSGD for $K=1$ epochs locally on CIFAR10 with covariate shift (rotation) across clusters, and under the same values as in \ref{['fig:var1var2']}.
Figure 3: Plot of $\texttt{Var}(\Delta \Tilde{\mathbf{\boldsymbol \theta }}_i^1(b_i^1)|\mathbf{\boldsymbol \theta }_{i}^{init})$ (left) and $\texttt{Var}(\Delta \Tilde{\mathbf{\boldsymbol \theta }}_i^e(b_i^e)|\mathbf{\boldsymbol \theta }_{i}^{e,0})$$(e>1)$ (right) v.s. both $b_i^1$ and $b_i^{>1}$. There are two clear takeaways: 1) for all $e\in \{1, \cdots, E\}$, $\texttt{Var}(\Delta \Tilde{\mathbf{\boldsymbol \theta }}_i^e(b_i^e)|\mathbf{\boldsymbol \theta }_{i}^{e,0})$ decreases with $b_i^e$ steeply (from \ref{['lemma:updatesnoise']}). 2) The effect of $b_i^{>1}$ on $\texttt{Var}(\Delta \Tilde{\mathbf{\boldsymbol \theta }}_i^1(b_i^1)|\mathbf{\boldsymbol \theta }_{i}^{init})$ (left figure) is considerable. The reason is that $b_i^{>1}$ is used in $E-1$ rounds and affects the noise scale $z_i(\epsilon, \delta, b_i^1, b_i^{>1}, N_i, K, E)$ used by DPSGD: see \ref{['fig:zvsq']} in the appendix for the plot of $z_i(\epsilon, \delta, b_i^1, b_i^{>1}, N_i, K, E)$ v.s. $b_i^1$ and $b_i^{>1}$. The results are obtained on CIFAR10 from Renyi-DP Accountant mironov2019renyidifferentialprivacysampled in a setting with $N_i=6600, \epsilon=5, \delta=10^{-4}, c=3, K=1, E=200, p=11,181,642, \eta_l=5\times 10^{-4}$.
Figure 4: Top: Average test accuracy across clients for different total privacy budgets $\epsilon$. Results are from four different runs. $10\%$ means performing local clustering by clients only in $10\%$ of the total number of rounds; i.e., rounds $E_c \leq e \leq E_c + \lfloor \frac{E}{10}\rfloor$ for 0.90R-DPCFL and rounds $1 \leq e \leq 1+\lfloor \frac{E}{10}\rfloor$ for 0.90IFCA (see \ref{['app:EMimplementation']}). \ref{['fig:avg_test_acc_allalgs']} in the appendix includes the 0.90Global baseline too. Bottom: Number of times (out of 4 runs) that 0.90R-DPCFL and 0.90IFCA successfully detect the underlying cluster structure of all existing clients.
Figure 5: Top: Average test accuracy across clients belonging to the minority cluster for different total privacy budgets $\epsilon$, and four different runs. Bottom: Number of times (out of 4 runs) that 0.90R-DPCFL and 0.90IFCA successfully detect the minority cluster.
...and 10 more figures

Theorems & Definitions (17)

Definition 3.1: ($\epsilon,\delta$)-DP Dwork2006OurDO
Lemma 4.0
Lemma 4.0
Theorem 4.1
Definition 3.1: Renyi Differential Privacy (RDP) mironovRDP
Proposition 3.1
Lemma 3.1
Proposition 3.1
Lemma 7.0
proof
...and 7 more

Differentially Private Clustered Federated Learning

TL;DR

Abstract

Differentially Private Clustered Federated Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (17)