Table of Contents
Fetching ...

High-Probability Analysis of Online and Federated Zero-Order Optimisation

Arya Akhavan, David Janz, El-Mahdi El-Mhamdi

TL;DR

This paper addresses gradient-free federated optimisation by introducing FedZero, which uses ℓ1-sphere randomisation and two-point function evaluations to build compact gradient estimators. The authors prove high-probability regret bounds for federated convex zero-order optimisation and, in the single-worker regime, for classical convex zero-order methods, leveraging novel concentration inequalities on the ℓ1-sphere and time-uniform sub-Gamma bounds. The approach hinges on smoothing the objective with a controllable bias, then bounding the second-moment and deviation terms in a high-probability regime via advanced probabilistic tools. The results demonstrate both theoretical guarantees and practical advantages in communication efficiency, with ℓ1-based randomisation offering favorable tail behaviour and potential privacy benefits, and set the stage for extensions to non-convex or heterogeneous environments.

Abstract

We study distributed learning in the context of gradient-free zero-order optimisation and introduce FedZero, a federated zero-order algorithm with sharp theoretical guarantees. Our contributions are threefold. First, in the federated convex setting, we derive high-probability guarantees for regret minimisation achieved by FedZero. Second, in the single-worker regime, corresponding to the classical zero-order framework with two-point feedback, we establish the first high-probability convergence guarantees for convex zero-order optimisation, strengthening previous results that held only in expectation. Third, to establish these guarantees, we develop novel concentration tools: (i) concentration inequalities with explicit constants for Lipschitz functions under the uniform measure on the $\ell_1$-sphere, and (ii) a time-uniform concentration inequality for squared sub-Gamma random variables. These probabilistic results underpin our high-probability guarantees and may also be of independent interest.

High-Probability Analysis of Online and Federated Zero-Order Optimisation

TL;DR

This paper addresses gradient-free federated optimisation by introducing FedZero, which uses ℓ1-sphere randomisation and two-point function evaluations to build compact gradient estimators. The authors prove high-probability regret bounds for federated convex zero-order optimisation and, in the single-worker regime, for classical convex zero-order methods, leveraging novel concentration inequalities on the ℓ1-sphere and time-uniform sub-Gamma bounds. The approach hinges on smoothing the objective with a controllable bias, then bounding the second-moment and deviation terms in a high-probability regime via advanced probabilistic tools. The results demonstrate both theoretical guarantees and practical advantages in communication efficiency, with ℓ1-based randomisation offering favorable tail behaviour and potential privacy benefits, and set the stage for extensions to non-convex or heterogeneous environments.

Abstract

We study distributed learning in the context of gradient-free zero-order optimisation and introduce FedZero, a federated zero-order algorithm with sharp theoretical guarantees. Our contributions are threefold. First, in the federated convex setting, we derive high-probability guarantees for regret minimisation achieved by FedZero. Second, in the single-worker regime, corresponding to the classical zero-order framework with two-point feedback, we establish the first high-probability convergence guarantees for convex zero-order optimisation, strengthening previous results that held only in expectation. Third, to establish these guarantees, we develop novel concentration tools: (i) concentration inequalities with explicit constants for Lipschitz functions under the uniform measure on the -sphere, and (ii) a time-uniform concentration inequality for squared sub-Gamma random variables. These probabilistic results underpin our high-probability guarantees and may also be of independent interest.

Paper Structure

This paper contains 26 sections, 24 theorems, 170 equations, 1 algorithm.

Key Result

theorem 4.1

Fix $\mathbf{x}\in\Theta$. Let $\{\mathbf{x}_{t}\}_{t=1}^{n}$ be the outputs of FedZero (Algorithm alg:grad_est). Assume that Assumptions ass:convexity, ass:lipschitz, and ass:bounded-domain hold. Then for any $\delta > 0$, with probability at least $1 - \delta$ we have that where where $L_1 = \log(1 + n/\delta)$, $L_2 = \log(1 + nm)$, and $C > 0$ is a universal constant independent of $n$, $d$,

Theorems & Definitions (50)

  • example 3.1: Negative entropy
  • example 3.2: $\ell_p$-regularizer
  • theorem 4.1
  • proof : Proof sketch
  • remark 4.1
  • corollary 4.2
  • corollary 4.3
  • remark 4.2
  • remark 4.3: Comparison with akhavan2022gradient
  • remark 4.4: Relation to convex bandit literature
  • ...and 40 more