Table of Contents
Fetching ...

Compressed Distributed Stochastic Nonconvex Optimization with Differential Privacy

Antai Xie, Xiaoqiang Ren, Xinlei Yi, Tao Yang, Xiaofan Wang

Abstract

This paper studies distributed stochastic nonconvex optimization problems with compressed communication and differential privacy, in which each agent aims to minimize the sum of all agents' cost functions by using local compressed information exchange. To this end, we propose a compressed distributed stochastic gradient descent algorithm, which is robust under a general class of compression operators that allow both relative and absolute compression errors. We then show that the proposed algorithm finds the first-order stationary point for smooth nonconvex functions with the linear speedup convergence rate $\mathcal{O}(1/\sqrt{nT})$ and converges to the optimum if the global cost function additionally satisfies the Polyak--Łojasiewicz (P--Ł) condition with the convergence rate $\mathcal{O}(1/(nT^θ)),θ\in(0,1)$, where $T$ is the total number of iterations and $n$ is the number of agents. Furthermore, if the P--Ł~constant is known in advance, we show that the proposed algorithm achieves a convergence rate $\mathcal{O}(1/(nT))$. Finally, we show that the proposed algorithm is able to achieve $(0,δ)$-differential privacy without sacrificing convergence accuracy. Numerical experiments are carried out to

Compressed Distributed Stochastic Nonconvex Optimization with Differential Privacy

Abstract

This paper studies distributed stochastic nonconvex optimization problems with compressed communication and differential privacy, in which each agent aims to minimize the sum of all agents' cost functions by using local compressed information exchange. To this end, we propose a compressed distributed stochastic gradient descent algorithm, which is robust under a general class of compression operators that allow both relative and absolute compression errors. We then show that the proposed algorithm finds the first-order stationary point for smooth nonconvex functions with the linear speedup convergence rate and converges to the optimum if the global cost function additionally satisfies the Polyak--Łojasiewicz (P--Ł) condition with the convergence rate , where is the total number of iterations and is the number of agents. Furthermore, if the P--Ł~constant is known in advance, we show that the proposed algorithm achieves a convergence rate . Finally, we show that the proposed algorithm is able to achieve -differential privacy without sacrificing convergence accuracy. Numerical experiments are carried out to
Paper Structure (17 sections, 11 theorems, 126 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 17 sections, 11 theorems, 126 equations, 4 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Suppose Assumptions as:strongconnected--as:finite and as:compressor hold and in Algorithm Al:RCP-SGD, let $\gamma_k=\beta_1\omega_k, ~\eta_k=\frac{\beta_2}{\omega_k},~\omega_k=\omega>\beta_3$, $\alpha_x\in(0,\frac{1}{r})$, and $h_k=h_0^k$, $\forall k\in\mathbb{N}$ where $~\beta_1>c_0,~\beta_2>0$, $h Let $\omega=\beta_2\sqrt{T}/\sqrt{n}$, for any $T>n(\beta_3/\beta_2)^2$, then we have

Figures (4)

  • Figure 1: The evolution of residual with respect to the transmitted bits under DSGD, Choco-SGD, and RCP-SGD
  • Figure 2: The evolution of residual with respect to the transmitted bits under RCP-SGD-3, RCP-SGD-5, and unRCP-SGD
  • Figure 3: The evolution of residual under RCP-SGD with different graphs
  • Figure 4: The evolution of estimate error of DLG uncer DSGD, and RCP-SGD-5

Theorems & Definitions (29)

  • Remark 1
  • Remark 2
  • Definition 1
  • Definition 2
  • Remark 3
  • Theorem 1
  • proof
  • Remark 4
  • Theorem 2
  • proof
  • ...and 19 more