Compressed Distributed Stochastic Nonconvex Optimization with Differential Privacy

Antai Xie; Xiaoqiang Ren; Xinlei Yi; Tao Yang; Xiaofan Wang

Compressed Distributed Stochastic Nonconvex Optimization with Differential Privacy

Antai Xie, Xiaoqiang Ren, Xinlei Yi, Tao Yang, Xiaofan Wang

Abstract

This paper studies distributed stochastic nonconvex optimization problems with compressed communication and differential privacy, in which each agent aims to minimize the sum of all agents' cost functions by using local compressed information exchange. To this end, we propose a compressed distributed stochastic gradient descent algorithm, which is robust under a general class of compression operators that allow both relative and absolute compression errors. We then show that the proposed algorithm finds the first-order stationary point for smooth nonconvex functions with the linear speedup convergence rate $\mathcal{O}(1/\sqrt{nT})$ and converges to the optimum if the global cost function additionally satisfies the Polyak--Łojasiewicz (P--Ł) condition with the convergence rate $\mathcal{O}(1/(nT^θ)),θ\in(0,1)$, where $T$ is the total number of iterations and $n$ is the number of agents. Furthermore, if the P--Ł~constant is known in advance, we show that the proposed algorithm achieves a convergence rate $\mathcal{O}(1/(nT))$. Finally, we show that the proposed algorithm is able to achieve $(0,δ)$-differential privacy without sacrificing convergence accuracy. Numerical experiments are carried out to

Compressed Distributed Stochastic Nonconvex Optimization with Differential Privacy

Abstract

and converges to the optimum if the global cost function additionally satisfies the Polyak--Łojasiewicz (P--Ł) condition with the convergence rate

, where

is the total number of iterations and

is the number of agents. Furthermore, if the P--Ł~constant is known in advance, we show that the proposed algorithm achieves a convergence rate

. Finally, we show that the proposed algorithm is able to achieve

-differential privacy without sacrificing convergence accuracy. Numerical experiments are carried out to

Paper Structure (17 sections, 11 theorems, 126 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 17 sections, 11 theorems, 126 equations, 4 figures, 1 table, 1 algorithm.

Introduction
Preliminaries and Problem Formulation
Distributed Stochastic Optimization
Differential Privacy and Compression Method
Compressed Primal--Dual SGD Algorithm
Algorithm Description
Convergence Analysis of RCP-SGD
Proof Sketch
Adapted compressors enable differential privacy
simulation
conclusion
Supporting Lemmas
Auxiliary results
The proof of Theorem \ref{['theo:convergence1']}
The proof of Theorem \ref{['theo:convergence2']}
...and 2 more sections

Key Result

Theorem 1

Suppose Assumptions as:strongconnected--as:finite and as:compressor hold and in Algorithm Al:RCP-SGD, let $\gamma_k=\beta_1\omega_k, ~\eta_k=\frac{\beta_2}{\omega_k},~\omega_k=\omega>\beta_3$, $\alpha_x\in(0,\frac{1}{r})$, and $h_k=h_0^k$, $\forall k\in\mathbb{N}$ where $~\beta_1>c_0,~\beta_2>0$, $h Let $\omega=\beta_2\sqrt{T}/\sqrt{n}$, for any $T>n(\beta_3/\beta_2)^2$, then we have

Figures (4)

Figure 1: The evolution of residual with respect to the transmitted bits under DSGD, Choco-SGD, and RCP-SGD
Figure 2: The evolution of residual with respect to the transmitted bits under RCP-SGD-3, RCP-SGD-5, and unRCP-SGD
Figure 3: The evolution of residual under RCP-SGD with different graphs
Figure 4: The evolution of estimate error of DLG uncer DSGD, and RCP-SGD-5

Theorems & Definitions (29)

Remark 1
Remark 2
Definition 1
Definition 2
Remark 3
Theorem 1
proof
Remark 4
Theorem 2
proof
...and 19 more

Compressed Distributed Stochastic Nonconvex Optimization with Differential Privacy

Abstract

Compressed Distributed Stochastic Nonconvex Optimization with Differential Privacy

Authors

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (29)