Private Heterogeneous Federated Learning Without a Trusted Server Revisited: Error-Optimal and Communication-Efficient Algorithms for Convex Losses

Changyu Gao; Andrew Lowy; Xingyu Zhou; Stephen J. Wright

Private Heterogeneous Federated Learning Without a Trusted Server Revisited: Error-Optimal and Communication-Efficient Algorithms for Convex Losses

Changyu Gao, Andrew Lowy, Xingyu Zhou, Stephen J. Wright

TL;DR

The paper tackles private federated learning in the absence of a trusted server under ISRL-DP, addressing heterogeneity across silos and the desire for fewer communication rounds. It introduces a localized ISRL-DP accelerated MB-SGD framework for smooth losses, achieving optimal excess risk in the heterogeneous setting with sharp, near-private lower-bound-matching communication complexity and improved gradient complexity. For nonsmooth losses, the authors develop smoothing-based and direct subgradient variants that preserve optimal ISRL-DP rates and offer favorable trade-offs between communication and computation. Theoretical results are complemented by MNIST-based experiments showing substantial practical gains over prior ISRL-DP methods, including robustness to unreliable communication. Overall, the work advances private FL by attaining error-optimal performance without assuming data homogeneity and by delivering improved efficiency in both communication and computation, with open questions about lower bounds and universal optimality across regimes.

Abstract

We revisit the problem of federated learning (FL) with private data from people who do not trust the server or other silos/clients. In this context, every silo (e.g. hospital) has data from several people (e.g. patients) and needs to protect the privacy of each person's data (e.g. health records), even if the server and/or other silos try to uncover this data. Inter-Silo Record-Level Differential Privacy (ISRL-DP) prevents each silo's data from being leaked, by requiring that silo i's communications satisfy item-level differential privacy. Prior work arXiv:2106.09779 characterized the optimal excess risk bounds for ISRL-DP algorithms with homogeneous (i.i.d.) silo data and convex loss functions. However, two important questions were left open: (1) Can the same excess risk bounds be achieved with heterogeneous (non-i.i.d.) silo data? (2) Can the optimal risk bounds be achieved with fewer communication rounds? In this paper, we give positive answers to both questions. We provide novel ISRL-DP FL algorithms that achieve the optimal excess risk bounds in the presence of heterogeneous silo data. Moreover, our algorithms are more communication-efficient than the prior state-of-the-art. For smooth loss functions, our algorithm achieves the optimal excess risk bound and has communication complexity that matches the non-private lower bound. Additionally, our algorithms are more computationally efficient than the previous state-of-the-art.

Private Heterogeneous Federated Learning Without a Trusted Server Revisited: Error-Optimal and Communication-Efficient Algorithms for Convex Losses

TL;DR

Abstract

Paper Structure (37 sections, 25 theorems, 94 equations, 3 figures, 2 tables, 7 algorithms)

This paper contains 37 sections, 25 theorems, 94 equations, 3 figures, 2 tables, 7 algorithms.

Introduction
Problem Setup.
Contributions.
Our Algorithms and Techniques.
Preliminaries
Differential Privacy.
Notation and Assumptions.
Localized ISRL-DP Accelerated MB-SGD for Smooth Losses
Comparison with Non-Private Communication Complexity Lower Bound.
Sketch of the Proof of \ref{['thm:main']}.
Error-Optimal Heterogeneous ISRL-DP FL for Nonsmooth Losses
Reductions to the Smooth Case via Smoothing
Nesterov Smoothing:
Convolutional Smoothing
A Direct Subgradient Algorithm
...and 22 more sections

Key Result

Theorem 2.1

Let $f(\cdot, x)$ be $\beta$-smooth and $M=N$. Assume $\varepsilon \leq 2 \ln(2/\delta), \delta \in (0,1)$. Then, there exist parameter choices such that alg:phased_acc is $(\varepsilon,\delta)$-ISRL-DP and has the following excess risk Moreover, the communication complexity of alg:phased_acc is Assuming $d = \Theta(n)$ and $\varepsilon = \Theta(1)$, the gradient complexity of alg:phased_acc is

Figures (3)

Figure 1: ISRL-DP maintains the privacy of each patient's record, provided the patient's own hospital is trusted. Silo $i$'s messages are item-level DP, preventing data leakage, even if the server/other silos collude to decode the data of silo $i$.
Figure 2: Reliable Communication
Figure 3: Unreliable Communication

Theorems & Definitions (51)

Definition 1.1: Differential Privacy dwork2006calibrating
Definition 1.2: Inter-Silo Record-Level Differential Privacy
Theorem 2.1: Upper Bound for Smooth Losses
Remark 2.2: Optimal risk in non-i.i.d private FL
Remark 2.3: Improved communication and gradient complexity
Theorem 2.4: Communication Lower Bound woodworth2020minibatch
Remark 2.5
proof : Proof sketch
Theorem 3.1: Nonsmooth FL via Nesterov smoothing
Theorem 3.2: Nonsmooth FL via convolutional smoothing
...and 41 more

Private Heterogeneous Federated Learning Without a Trusted Server Revisited: Error-Optimal and Communication-Efficient Algorithms for Convex Losses

TL;DR

Abstract

Private Heterogeneous Federated Learning Without a Trusted Server Revisited: Error-Optimal and Communication-Efficient Algorithms for Convex Losses

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (51)