A Stochastic Operator Framework for Optimization and Learning with Sub-Weibull Errors
Nicola Bastianello, Liam Madden, Ruggero Carli, Emiliano Dall'Anese
TL;DR
This work introduces a stochastic operator framework to analyze convergence of stochastic optimization and learning algorithms subject to random coordinate updates and persistent additive errors with sub-Weibull tails. By modeling the iterative steps as time-varying contractive or averaged operators perturbed by sub-Weibull noise, the authors derive mean convergence and high-probability bounds, capturing transient behavior and asymptotic limits in online settings. The framework unifies analysis of asynchronous federated learning, online data shifts, and inexact gradient computations, and shows that HP bounds scale with a $\log(1/\delta)$ factor, offering sharper guarantees than Markov-based approaches. Numerical experiments in federated-learning-like scenarios illustrate how asynchrony, heavy-tailed noise, and stochastic gradients influence convergence and generalization, underscoring the practical relevance for distributed and online optimization.
Abstract
This paper proposes a framework to study the convergence of stochastic optimization and learning algorithms. The framework is modeled over the different challenges that these algorithms pose, such as (i) the presence of random additive errors (e.g. due to stochastic gradients), and (ii) random coordinate updates (e.g. due to asynchrony in distributed set-ups). The paper covers both convex and strongly convex problems, and it also analyzes online scenarios, involving changes in the data and costs. The paper relies on interpreting stochastic algorithms as the iterated application of stochastic operators, thus allowing us to use the powerful tools of operator theory. In particular, we consider operators characterized by additive errors with sub-Weibull distribution (which parameterize a broad class of errors by their tail probability), and random updates. In this framework we derive convergence results in mean and in high probability, by providing bounds to the distance of the current iteration from a solution of the optimization or learning problem. The contributions are discussed in light of federated learning applications.
