Table of Contents
Fetching ...

Online Newton Method for Bandit Convex Optimisation

Hidde Fokkema, Dirk van der Hoeven, Tor Lattimore, Jack J. Mayo

TL;DR

This work presents a computationally efficient approach to zeroth-order bandit convex optimisation by embedding constrained problems in an unconstrained online Newton framework. It builds a convex extension of the constrained losses via the Minkowski functional and couples it with a surrogate Gaussian-based learning procedure, complemented by a refined restart mechanism to handle adversarial losses. The method achieves a high-probability regret of $Reg_n\le d^{3.5}\sqrt{n}\,\mathrm{polylog}(n,d,1/\delta)$ in the adversarial setting and $Reg_n\le M d^{2}\sqrt{n}\,\mathrm{polylog}(n,d,1/\delta)$ in the stochastic setting, with computational efficiency under membership and sampling oracles for $K$. It also extends to bandit submodular minimisation via Lovász extensions and discusses practical and theoretical trade-offs between geometry, extension schemes, and restart strategies. Overall, the paper advances a constructive, geometry-aware framework for bandit convex optimisation that integrates convex extensions, surrogate losses, and adaptive restarts to achieve near-optimal regret bounds with polynomial-time implementability.

Abstract

We introduce a computationally efficient algorithm for zeroth-order bandit convex optimisation and prove that in the adversarial setting its regret is at most $d^{3.5} \sqrt{n} \mathrm{polylog}(n, d)$ with high probability where $d$ is the dimension and $n$ is the time horizon. In the stochastic setting the bound improves to $M d^{2} \sqrt{n} \mathrm{polylog}(n, d)$ where $M \in [d^{-1/2}, d^{-1 / 4}]$ is a constant that depends on the geometry of the constraint set and the desired computational properties.

Online Newton Method for Bandit Convex Optimisation

TL;DR

This work presents a computationally efficient approach to zeroth-order bandit convex optimisation by embedding constrained problems in an unconstrained online Newton framework. It builds a convex extension of the constrained losses via the Minkowski functional and couples it with a surrogate Gaussian-based learning procedure, complemented by a refined restart mechanism to handle adversarial losses. The method achieves a high-probability regret of in the adversarial setting and in the stochastic setting, with computational efficiency under membership and sampling oracles for . It also extends to bandit submodular minimisation via Lovász extensions and discusses practical and theoretical trade-offs between geometry, extension schemes, and restart strategies. Overall, the paper advances a constructive, geometry-aware framework for bandit convex optimisation that integrates convex extensions, surrogate losses, and adaptive restarts to achieve near-optimal regret bounds with polynomial-time implementability.

Abstract

We introduce a computationally efficient algorithm for zeroth-order bandit convex optimisation and prove that in the adversarial setting its regret is at most with high probability where is the dimension and is the time horizon. In the stochastic setting the bound improves to where is a constant that depends on the geometry of the constraint set and the desired computational properties.
Paper Structure (46 sections, 37 theorems, 139 equations, 1 algorithm)

This paper contains 46 sections, 37 theorems, 139 equations, 1 algorithm.

Key Result

Theorem 1

There exists an algorithm such that with probability at least $1 - \delta$, Furthermore, the algorithm is computationally efficient given a membership oracle for $K$.

Theorems & Definitions (52)

  • Theorem 1
  • Theorem 2
  • Lemma 3
  • proof
  • Lemma 4
  • Remark 5
  • proof : Lemma \ref{['lem:extend']}
  • Lemma 6
  • Definition 7
  • Lemma 8
  • ...and 42 more