Online Newton Method for Bandit Convex Optimisation

Hidde Fokkema; Dirk van der Hoeven; Tor Lattimore; Jack J. Mayo

Online Newton Method for Bandit Convex Optimisation

Hidde Fokkema, Dirk van der Hoeven, Tor Lattimore, Jack J. Mayo

TL;DR

This work presents a computationally efficient approach to zeroth-order bandit convex optimisation by embedding constrained problems in an unconstrained online Newton framework. It builds a convex extension of the constrained losses via the Minkowski functional and couples it with a surrogate Gaussian-based learning procedure, complemented by a refined restart mechanism to handle adversarial losses. The method achieves a high-probability regret of $Reg_n\le d^{3.5}\sqrt{n}\,\mathrm{polylog}(n,d,1/\delta)$ in the adversarial setting and $Reg_n\le M d^{2}\sqrt{n}\,\mathrm{polylog}(n,d,1/\delta)$ in the stochastic setting, with computational efficiency under membership and sampling oracles for $K$. It also extends to bandit submodular minimisation via Lovász extensions and discusses practical and theoretical trade-offs between geometry, extension schemes, and restart strategies. Overall, the paper advances a constructive, geometry-aware framework for bandit convex optimisation that integrates convex extensions, surrogate losses, and adaptive restarts to achieve near-optimal regret bounds with polynomial-time implementability.

Abstract

We introduce a computationally efficient algorithm for zeroth-order bandit convex optimisation and prove that in the adversarial setting its regret is at most $d^{3.5} \sqrt{n} \mathrm{polylog}(n, d)$ with high probability where $d$ is the dimension and $n$ is the time horizon. In the stochastic setting the bound improves to $M d^{2} \sqrt{n} \mathrm{polylog}(n, d)$ where $M \in [d^{-1/2}, d^{-1 / 4}]$ is a constant that depends on the geometry of the constraint set and the desired computational properties.

Online Newton Method for Bandit Convex Optimisation

TL;DR

in the adversarial setting and

in the stochastic setting, with computational efficiency under membership and sampling oracles for

. It also extends to bandit submodular minimisation via Lovász extensions and discusses practical and theoretical trade-offs between geometry, extension schemes, and restart strategies. Overall, the paper advances a constructive, geometry-aware framework for bandit convex optimisation that integrates convex extensions, surrogate losses, and adaptive restarts to achieve near-optimal regret bounds with polynomial-time implementability.

Abstract

We introduce a computationally efficient algorithm for zeroth-order bandit convex optimisation and prove that in the adversarial setting its regret is at most

with high probability where

is the dimension and

is the time horizon. In the stochastic setting the bound improves to

where

is a constant that depends on the geometry of the constraint set and the desired computational properties.

Paper Structure (46 sections, 37 theorems, 139 equations, 1 algorithm)

This paper contains 46 sections, 37 theorems, 139 equations, 1 algorithm.

Introduction
Notation
Related work
Noise
Regularity of constraint set
Distribution theory
Overview of the analysis
Online Newton step
Properties of quadratic surrogate
Challenge of large losses
Adversarial setting
Summary
Convex extensions
Extension and the regret
Surrogate loss
...and 31 more sections

Key Result

Theorem 1

There exists an algorithm such that with probability at least $1 - \delta$, Furthermore, the algorithm is computationally efficient given a membership oracle for $K$.

Theorems & Definitions (52)

Theorem 1
Theorem 2
Lemma 3
proof
Lemma 4
Remark 5
proof : Lemma \ref{['lem:extend']}
Lemma 6
Definition 7
Lemma 8
...and 42 more

Online Newton Method for Bandit Convex Optimisation

TL;DR

Abstract

Online Newton Method for Bandit Convex Optimisation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (52)