Distributed Stochastic Optimization under Heavy-Tailed Noises
Chao Sun, Huiming Zhang, Bo Chen, Li Yu
TL;DR
This work studies decentralized stochastic optimization under heavy-tailed gradient noise (δ ∈ (1,2], with E[||ξ||^δ|F_k] ≤ ν^δ), without a central server. It combines gradient clipping with a distributed stochastic subgradient projection on a strongly connected graph, proving almost-sure convergence to the optima for convex and strongly convex objectives under carefully chosen step-size schedules. Theoretical results detail convergence conditions and rates that depend on δ and graph properties, and empirical tests on synthetic and real datasets confirm the method’s superiority over centralized or non-clipped baselines in heavy-tailed noise scenarios. The approach advances robust distributed optimization by removing bounded-variance assumptions and enabling scalable, fully decentralized operation in networked systems.
Abstract
This paper studies the distributed optimization problem under the influence of heavy-tailed gradient noises. Here, a heavy-tailed noise means that the noise does not necessarily satisfy the bounded variance assumption. Instead, it satisfies a more general assumption. The commonly-used bounded variance assumption is a special case of the considered noise assumption. A typical example of this kind of noise is a Pareto distribution noise with tail index within (1,2], which has infinite variance. Despite that there has been several distributed optimization algorithms proposed for the heavy-tailed noise scenario, these algorithms need a centralized server in the network which collects the information of all clients. Different from these algorithms, this paper considers that there is no centralized server and the agents can only exchange information with neighbors in a communication graph. A distributed method combining gradient clipping and distributed stochastic subgradient projection is proposed. It is proven that when the gradient descent step-size and the gradient clipping step-size meet certain conditions, the state of each agent converges to the optimal solution of the distributed optimization problem with probability 1. The simulation results validate the algorithm.
