Table of Contents
Fetching ...

Distributed Stochastic Optimization under Heavy-Tailed Noises

Chao Sun, Huiming Zhang, Bo Chen, Li Yu

TL;DR

This work studies decentralized stochastic optimization under heavy-tailed gradient noise (δ ∈ (1,2], with E[||ξ||^δ|F_k] ≤ ν^δ), without a central server. It combines gradient clipping with a distributed stochastic subgradient projection on a strongly connected graph, proving almost-sure convergence to the optima for convex and strongly convex objectives under carefully chosen step-size schedules. Theoretical results detail convergence conditions and rates that depend on δ and graph properties, and empirical tests on synthetic and real datasets confirm the method’s superiority over centralized or non-clipped baselines in heavy-tailed noise scenarios. The approach advances robust distributed optimization by removing bounded-variance assumptions and enabling scalable, fully decentralized operation in networked systems.

Abstract

This paper studies the distributed optimization problem under the influence of heavy-tailed gradient noises. Here, a heavy-tailed noise means that the noise does not necessarily satisfy the bounded variance assumption. Instead, it satisfies a more general assumption. The commonly-used bounded variance assumption is a special case of the considered noise assumption. A typical example of this kind of noise is a Pareto distribution noise with tail index within (1,2], which has infinite variance. Despite that there has been several distributed optimization algorithms proposed for the heavy-tailed noise scenario, these algorithms need a centralized server in the network which collects the information of all clients. Different from these algorithms, this paper considers that there is no centralized server and the agents can only exchange information with neighbors in a communication graph. A distributed method combining gradient clipping and distributed stochastic subgradient projection is proposed. It is proven that when the gradient descent step-size and the gradient clipping step-size meet certain conditions, the state of each agent converges to the optimal solution of the distributed optimization problem with probability 1. The simulation results validate the algorithm.

Distributed Stochastic Optimization under Heavy-Tailed Noises

TL;DR

This work studies decentralized stochastic optimization under heavy-tailed gradient noise (δ ∈ (1,2], with E[||ξ||^δ|F_k] ≤ ν^δ), without a central server. It combines gradient clipping with a distributed stochastic subgradient projection on a strongly connected graph, proving almost-sure convergence to the optima for convex and strongly convex objectives under carefully chosen step-size schedules. Theoretical results detail convergence conditions and rates that depend on δ and graph properties, and empirical tests on synthetic and real datasets confirm the method’s superiority over centralized or non-clipped baselines in heavy-tailed noise scenarios. The approach advances robust distributed optimization by removing bounded-variance assumptions and enabling scalable, fully decentralized operation in networked systems.

Abstract

This paper studies the distributed optimization problem under the influence of heavy-tailed gradient noises. Here, a heavy-tailed noise means that the noise does not necessarily satisfy the bounded variance assumption. Instead, it satisfies a more general assumption. The commonly-used bounded variance assumption is a special case of the considered noise assumption. A typical example of this kind of noise is a Pareto distribution noise with tail index within (1,2], which has infinite variance. Despite that there has been several distributed optimization algorithms proposed for the heavy-tailed noise scenario, these algorithms need a centralized server in the network which collects the information of all clients. Different from these algorithms, this paper considers that there is no centralized server and the agents can only exchange information with neighbors in a communication graph. A distributed method combining gradient clipping and distributed stochastic subgradient projection is proposed. It is proven that when the gradient descent step-size and the gradient clipping step-size meet certain conditions, the state of each agent converges to the optimal solution of the distributed optimization problem with probability 1. The simulation results validate the algorithm.
Paper Structure (16 sections, 6 theorems, 34 equations, 2 figures)

This paper contains 16 sections, 6 theorems, 34 equations, 2 figures.

Key Result

Theorem 1

Suppose that Assumptions doubly-unbiased hold, and the decreasing sequence $\alpha_{k}$ and the increasing sequence$\tau_{k}$ satisfy Suppose that there exist constants $\omega>0$ and $\varpi\in\mathbb{R}$ such that where $\mu=0$ if $f_i(\theta)$, $\forall i \in\mathcal{V}_N$ is convex and $\mu>0$ if $f_i(\theta)$, $\forall i \in\mathcal{V}_N$ is $\mu$-strongly convex. Then, under the algorithm

Figures (2)

  • Figure 1: The evolution of $\text{log}_{10}\frac{f(y_k)-f(\theta^*)}{f(y_0)-f(\theta^*)}$.
  • Figure 2: The evolution of $\text{log}_{10}\frac{f(y_k)-f(\theta^*)}{f(y_0)-f(\theta^*)}$.

Theorems & Definitions (12)

  • Remark 1
  • Remark 2
  • Theorem 1
  • Proposition 1
  • Proposition 2
  • Theorem 2
  • Proposition 3
  • Theorem 3
  • Remark 3
  • Remark 4
  • ...and 2 more