Straggler-Resilient Differentially-Private Decentralized Learning

Yauhen Yakimenka; Chung-Wei Weng; Hsuan-Yin Lin; Eirik Rosnes; Jörg Kliewer

Straggler-Resilient Differentially-Private Decentralized Learning

Yauhen Yakimenka, Chung-Wei Weng, Hsuan-Yin Lin, Eirik Rosnes, Jörg Kliewer

TL;DR

This work tackles the straggler problem in fully decentralized learning on a ring while preserving user privacy by extending network differential privacy (NDP) to account for total training latency. It introduces a skipping scheme and analyzes both a ring and a randomized ring, deriving convergence guarantees that align with the baseline results of prior SGD analyses and DP bounds via Rényi DP with privacy amplification by iteration. A key finding is that the DP leakage $\varepsilon_{\mathrm{skip}}$ scales linearly with the total number of updates (and thus with $h_{\max}$ under certain conditions), while randomizing the ring improves privacy amplification without sacrificing asymptotic convergence. Empirical validation on OpenML housing data for logistic regression and on MNIST/CIFAR-10 demonstrates practical latency reductions and a quantifiable privacy-utility trade-off, illustrating how to tune the skip timeout to balance speed, accuracy, and privacy in decentralized learning systems.

Abstract

We consider the straggler problem in decentralized learning over a logical ring while preserving user data privacy. Especially, we extend the recently proposed framework of differential privacy (DP) amplification by decentralization by Cyffers and Bellet to include overall training latency--comprising both computation and communication latency. Analytical results on both the convergence speed and the DP level are derived for both a skipping scheme (which ignores the stragglers after a timeout) and a baseline scheme that waits for each node to finish before the training continues. A trade-off between overall training latency, accuracy, and privacy, parameterized by the timeout of the skipping scheme, is identified and empirically validated for logistic regression on a real-world dataset and for image classification using the MNIST and CIFAR-10 datasets.

Straggler-Resilient Differentially-Private Decentralized Learning

TL;DR

scales linearly with the total number of updates (and thus with

under certain conditions), while randomizing the ring improves privacy amplification without sacrificing asymptotic convergence. Empirical validation on OpenML housing data for logistic regression and on MNIST/CIFAR-10 demonstrates practical latency reductions and a quantifiable privacy-utility trade-off, illustrating how to tune the skip timeout to balance speed, accuracy, and privacy in decentralized learning systems.

Abstract

Paper Structure (15 sections, 5 theorems, 2 equations, 4 figures, 1 algorithm)

This paper contains 15 sections, 5 theorems, 2 equations, 4 figures, 1 algorithm.

Introduction
Preliminaries
Notation
Definitions and Assumptions
System Model
Network Differential Privacy
Empirical Risk Minimization
Skipping Scheme
Convergence Analysis
Privacy Analysis
Experiments
Computation and Communication Latency
Convergence Versus Privacy and Average Latency
Empirical Results
Logistic Regression

Key Result

Theorem 1

Under ass:Lipschitz, if the diameter of $\mathcal{W}$ is $d_{\mathcal{W}}$, the expected difference between the minimum value $f(\tau^*;\cdot)$ and that from alg:ss with an arbitrary learning rate parameter $\zeta > 0$ after $h_{\max}$ steps is bounded as where $\forall\, h > 0$, and $e_0 \triangleq d_{\mathcal{W}} k$, $|\lambda_1| = \frac{1-p}{\sqrt{(1+p^2) - 2p\cos(\frac{2\pi}{n})}}$ and $0 <

Figures (4)

Figure 1: Illustrating the $j$-th round in which node $v_i$ is a straggler.
Figure 2: Expected error bound (decreasing curves; \ref{['thm:convergence']}) and privacy leakage level $\varepsilon_{\mathrm{skip}}$ (increasing curves; \ref{['thm:ss_ring_all_noise', 'thm:ss_rand_ring_all_noise']}) vs average latency (\ref{['prop:latency']}) for $n=10$ (top row) and $n=500$ (bottom row). Solid lines are for a fixed ring (Skip-Ring), while dashed lines are for Skip-Rand-Ring.
Figure 3: Privacy leakage level $\varepsilon_{\mathrm{skip}}$ vs expected error bound for $n=10$ (top row) and $n=500$ (bottom row). Solid lines are for a fixed ring (Skip-Ring), while dashed lines are for Skip-Rand-Ring.
Figure :

Theorems & Definitions (10)

Definition 1: $k$-Lipschitz continuity
Definition 2: $\beta$-smoothness
Definition 3: NDP CyffersBellet22_1
Theorem 1
Remark 1
Theorem 2
Theorem 3
Remark 2
Lemma 1
Lemma 2

Straggler-Resilient Differentially-Private Decentralized Learning

TL;DR

Abstract

Straggler-Resilient Differentially-Private Decentralized Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (10)