One-Point Sampling for Distributed Bandit Convex Optimization with Time-Varying Constraints

Kunpeng Zhang; Lei Xu; Xinlei Yi; Guanghui Wen; Lihua Xie; Tianyou Chai; Tao Yang

One-Point Sampling for Distributed Bandit Convex Optimization with Time-Varying Constraints

Kunpeng Zhang, Lei Xu, Xinlei Yi, Guanghui Wen, Lihua Xie, Tianyou Chai, Tao Yang

TL;DR

The paper tackles distributed bandit convex optimization with time-varying constraints by introducing a one-point sampling based distributed online primal–dual algorithm. It achieves sublinear dynamic network regret and cumulative constraint violations under sublinear benchmark path-lengths and provides explicit static regret bounds for convex and strongly convex losses. The analysis shows that one-point feedback offers a communication- and computation-efficient alternative to two-point schemes, without requiring knowledge of the horizon, while still delivering robust performance on time-varying networks. Numerical experiments on a 100-agent network validate the theoretical results and demonstrate reduced sampling complexity relative to two-point methods, highlighting practical gains for large-scale distributed systems.

Abstract

This paper considers the distributed bandit convex optimization problem with time-varying constraints. In this problem, the global loss function is the average of all the local convex loss functions, which are unknown beforehand. Each agent iteratively makes its own decision subject to time-varying inequality constraints which can be violated but are fulfilled in the long run. For a uniformly jointly strongly connected time-varying directed graph, a distributed bandit online primal-dual projection algorithm with one-point sampling is proposed. We show that sublinear dynamic network regret and network cumulative constraint violation are achieved if the path-length of the benchmark also increases in a sublinear manner. In addition, an $\mathcal{O}({T^{3/4 + g}})$ static network regret bound and an $\mathcal{O}( {{T^{1 - {g}/2}}} )$ network cumulative constraint violation bound are established, where $T$ is the total number of iterations and $g \in ( {0,1/4} )$ is a trade-off parameter. Moreover, a reduced static network regret bound $\mathcal{O}( {T^{2/3 + 4g /3}} )$ is established for strongly convex local loss functions. Finally, a numerical example is presented to validate the theoretical results.

One-Point Sampling for Distributed Bandit Convex Optimization with Time-Varying Constraints

TL;DR

Abstract

One-Point Sampling for Distributed Bandit Convex Optimization with Time-Varying Constraints

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (19)