Table of Contents
Fetching ...

Distributed Online Bandit Nonconvex Optimization with One-Point Residual Feedback via Dynamic Regret

Youqing Hua, Shuai Liu, Yiguang Hong, Karl Henrik Johansson, Guangchen Wang

TL;DR

The paper tackles distributed online bandit optimization with nonconvex losses over time-varying graphs, introducing a one-point residual feedback gradient estimator to enable gradient-free updates with only $\mathcal{O}(1)$ function evaluations per iteration. The proposed OP-DOPGD algorithm achieves sublinear dynamic regret under Lipschitz and smoothness assumptions, with rigorous results for both nonconvex and convex settings and explicit bounds that account for graph connectivity and problem nonstationarity via $\Theta_T$ and $\omega_T$. Theoretical guarantees are complemented by simulations showing that one-point residual feedback rivals two-point and full-information methods while retaining practical query complexity. This work extends centralized residual-feedback techniques to distributed online environments, offering a scalable, communication-efficient framework for dynamic, nonconvex optimization in networks. Overall, the results demonstrate that careful gradient estimation and consensus-based updates can closely match gradient-based performance in challenging online settings.

Abstract

This paper considers the distributed online bandit optimization problem with nonconvex loss functions over a time-varying digraph. This problem can be viewed as a repeated game between a group of online players and an adversary. At each round, each player selects a decision from the constraint set, and then the adversary assigns an arbitrary, possibly nonconvex, loss function to this player. Only the loss value at the current round, rather than the entire loss function or any other information (e.g. gradient), is privately revealed to the player. Players aim to minimize a sequence of global loss functions, which are the sum of local losses. We observe that traditional multi-point bandit algorithms are unsuitable for online optimization, where the data for the loss function are not all a priori, while the one-point bandit algorithms suffer from poor regret guarantees. To address these issues, we propose a novel one-point residual feedback distributed online algorithm. This algorithm estimates the gradient using residuals from two points, effectively reducing the regret bound while maintaining $\mathcal{O}(1)$ sampling complexity per iteration. We employ a rigorous metric, dynamic regret, to evaluate the algorithm's performance. By appropriately selecting the step size and smoothing parameters, we demonstrate that the expected dynamic regret of our algorithm is comparable to existing algorithms that use two-point feedback, provided the deviation in the objective function sequence and the path length of the minimization grows sublinearly. Finally, we validate the effectiveness of the proposed algorithm through numerical simulations.

Distributed Online Bandit Nonconvex Optimization with One-Point Residual Feedback via Dynamic Regret

TL;DR

The paper tackles distributed online bandit optimization with nonconvex losses over time-varying graphs, introducing a one-point residual feedback gradient estimator to enable gradient-free updates with only function evaluations per iteration. The proposed OP-DOPGD algorithm achieves sublinear dynamic regret under Lipschitz and smoothness assumptions, with rigorous results for both nonconvex and convex settings and explicit bounds that account for graph connectivity and problem nonstationarity via and . Theoretical guarantees are complemented by simulations showing that one-point residual feedback rivals two-point and full-information methods while retaining practical query complexity. This work extends centralized residual-feedback techniques to distributed online environments, offering a scalable, communication-efficient framework for dynamic, nonconvex optimization in networks. Overall, the results demonstrate that careful gradient estimation and consensus-based updates can closely match gradient-based performance in challenging online settings.

Abstract

This paper considers the distributed online bandit optimization problem with nonconvex loss functions over a time-varying digraph. This problem can be viewed as a repeated game between a group of online players and an adversary. At each round, each player selects a decision from the constraint set, and then the adversary assigns an arbitrary, possibly nonconvex, loss function to this player. Only the loss value at the current round, rather than the entire loss function or any other information (e.g. gradient), is privately revealed to the player. Players aim to minimize a sequence of global loss functions, which are the sum of local losses. We observe that traditional multi-point bandit algorithms are unsuitable for online optimization, where the data for the loss function are not all a priori, while the one-point bandit algorithms suffer from poor regret guarantees. To address these issues, we propose a novel one-point residual feedback distributed online algorithm. This algorithm estimates the gradient using residuals from two points, effectively reducing the regret bound while maintaining sampling complexity per iteration. We employ a rigorous metric, dynamic regret, to evaluate the algorithm's performance. By appropriately selecting the step size and smoothing parameters, we demonstrate that the expected dynamic regret of our algorithm is comparable to existing algorithms that use two-point feedback, provided the deviation in the objective function sequence and the path length of the minimization grows sublinearly. Finally, we validate the effectiveness of the proposed algorithm through numerical simulations.
Paper Structure (22 sections, 7 theorems, 81 equations, 3 figures, 1 table, 1 algorithm)

This paper contains 22 sections, 7 theorems, 81 equations, 3 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

Let Assumption assum1 holds, then for any $i, j \in \mathcal{V}$ and $0\le k \le s$, where $\Gamma = (1 - {\zeta }/{4n^2})^{-2}$ and $\gamma = (1 - {\zeta }/{4n^2})^{{1}/{U}}$.

Figures (3)

  • Figure 1: Time-varying graph configurations
  • Figure 2: The dynamic bounds of applying the proposed residual one-point feedback \ref{['eq11']}, the two-point oracle 33nesterov_random_2017, and the traditional one-point oracle 41flexman2005 to the convex DOBO problem.
  • Figure 3: The dynamic bounds of applying the proposed residual one-point feedback \ref{['eq11']}, the two-point oracle 33nesterov_random_2017, and the traditional one-point oracle 41flexman2005 to the nonconvex DOBO problem.

Theorems & Definitions (24)

  • Lemma 1: Nedic
  • Lemma 2
  • Definition 1: next
  • Remark 1
  • Remark 2
  • Lemma 3
  • proof
  • Lemma 4
  • proof
  • Remark 3
  • ...and 14 more