Adversarial bandit optimization for approximately linear functions
Zhuoyu Cheng, Kohei Hatano, Eiji Takimoto
TL;DR
This work studies bandit optimization for losses that are epsilon-approximately linear, i.e., a linear term plus an adversarial perturbation, in a nonconvex, non-smooth setting. It extends SCRiBLe with lifting and a nu-normal barrier to achieve both expected and high-probability regret bounds, and recovers improved high-probability guarantees for bandit linear optimization when epsilon = 0. A novel regret decomposition separates Reg-Term, Deviation-Term, and Error-Term, enabling tighter probabilistic bounds and a natural OB/BB transformation to black-box optimization. Experiments on synthetic data illustrate favorable performance against prior SCRiBLe variants, supporting the theoretical improvements.
Abstract
We consider a bandit optimization problem for nonconvex and non-smooth functions, where in each trial the loss function is the sum of a linear function and a small but arbitrary perturbation chosen after observing the player's choice. We give both expected and high probability regret bounds for the problem. Our result also implies an improved high-probability regret bound for the bandit linear optimization, a special case with no perturbation. We also give a lower bound on the expected regret.
