No-Regret Linear Bandits under Gap-Adjusted Misspecification

Chong Liu; Dan Qiao; Ming Yin; Ilija Bogunovic; Yu-Xiang Wang

No-Regret Linear Bandits under Gap-Adjusted Misspecification

Chong Liu, Dan Qiao, Ming Yin, Ilija Bogunovic, Yu-Xiang Wang

TL;DR

This work introduces ρ-gap-adjusted misspecification (ρ-GAM) for linear bandits, where the approximation error at a point is proportional to its suboptimality gap, enabling robust optimization without realizability. It shows that LinUCB remains a no-regret algorithm under moderate ρ-GAM (roughly ρ ≤ O(1/(d√log T))) and achieves near-optimal ${\tilde O}(\sqrt{T})$ regret, a dramatic improvement over uniform misspecification. It further develops a Phased Elimination-based algorithm that tolerates constant ρ and attains ${\tilde O}(\sqrt{T})$ regret with only O(\log T) batches, plus an adaptive ${\tilde O}(\log T/\Delta)$ rate when a constant suboptimal gap exists, all with deployment-efficient batching. A weak GAM variant expands the framework with LinUCBw, preserving no-regret guarantees under a looser assumption, and a unified (ρ, ε)-GAM framework connects gap-adjusted and uniform misspecification regimes, offering a broad, deployment-friendly approach to misspecified bandits and potential extensions to kernels and reinforcement learning.

Abstract

This work studies linear bandits under a new notion of gap-adjusted misspecification and is an extension of Liu et al. (2023). When the underlying reward function is not linear, existing linear bandits work usually relies on a uniform misspecification parameter $ε$ that measures the sup-norm error of the best linear approximation. This results in an unavoidable linear regret whenever $ε> 0$. We propose a more natural model of misspecification which only requires the approximation error at each input $x$ to be proportional to the suboptimality gap at $x$. It captures the intuition that, for optimization problems, near-optimal regions should matter more and we can tolerate larger approximation errors in suboptimal regions. Quite surprisingly, we show that the classical LinUCB algorithm -- designed for the realizable case -- is automatically robust against such $ρ$-gap-adjusted misspecification with parameter $ρ$ diminishing at $O(1/(d \sqrt{\log T}))$. It achieves a near-optimal $O(\sqrt{T})$ regret for problems that the best-known regret is almost linear in time horizon $T$. We further advance this frontier by presenting a novel phased elimination-based algorithm whose gap-adjusted misspecification parameter $ρ= O(1/\sqrt{d})$ does not scale with $T$. This algorithm attains optimal $O(\sqrt{T})$ regret and is deployment-efficient, requiring only $\log T$ batches of exploration. It also enjoys an adaptive $O(\log T)$ regret when a constant suboptimality gap exists. Technically, our proof relies on a novel self-bounding argument that bounds the part of the regret due to misspecification by the regret itself, and a new inductive lemma that limits the misspecification error within the suboptimality gap for all valid actions in each batch selected by G-optimal design.

No-Regret Linear Bandits under Gap-Adjusted Misspecification

TL;DR

regret, a dramatic improvement over uniform misspecification. It further develops a Phased Elimination-based algorithm that tolerates constant ρ and attains

regret with only O(\log T) batches, plus an adaptive

rate when a constant suboptimal gap exists, all with deployment-efficient batching. A weak GAM variant expands the framework with LinUCBw, preserving no-regret guarantees under a looser assumption, and a unified (ρ, ε)-GAM framework connects gap-adjusted and uniform misspecification regimes, offering a broad, deployment-friendly approach to misspecified bandits and potential extensions to kernels and reinforcement learning.

Abstract

that measures the sup-norm error of the best linear approximation. This results in an unavoidable linear regret whenever

. We propose a more natural model of misspecification which only requires the approximation error at each input

to be proportional to the suboptimality gap at

. It captures the intuition that, for optimization problems, near-optimal regions should matter more and we can tolerate larger approximation errors in suboptimal regions. Quite surprisingly, we show that the classical LinUCB algorithm -- designed for the realizable case -- is automatically robust against such

-gap-adjusted misspecification with parameter

diminishing at

. It achieves a near-optimal

regret for problems that the best-known regret is almost linear in time horizon

. We further advance this frontier by presenting a novel phased elimination-based algorithm whose gap-adjusted misspecification parameter

does not scale with

. This algorithm attains optimal

regret and is deployment-efficient, requiring only

batches of exploration. It also enjoys an adaptive

regret when a constant suboptimality gap exists. Technically, our proof relies on a novel self-bounding argument that bounds the part of the regret due to misspecification by the regret itself, and a new inductive lemma that limits the misspecification error within the suboptimality gap for all valid actions in each batch selected by G-optimal design.

Paper Structure (19 sections, 22 theorems, 102 equations, 3 figures, 3 algorithms)

This paper contains 19 sections, 22 theorems, 102 equations, 3 figures, 3 algorithms.

Introduction
Related Work
Preliminaries
Notations
Problem Setup
Assumptions
Algorithms
Results of LinUCB Algorithm
Regret Analysis
Confidence Analysis
Results of Phased Elimination Algorithm
Main Regret Analysis
Gap-Dependent Analysis
Conclusion
Technical Lemmas
...and 4 more sections

Key Result

Proposition 4

Let $f$ be a $\rho$-GAM approximation of $f_0$ (Definition def:lm). Then it holds:

Figures (3)

Figure 1: Examples of misspecification in 1 dimension. The blue line denotes the non-linear true function $f_0$ and the red line shows a feasible linear function that is able to optimize $f_0$ by taking $x_*=2$. (a) An example of $\epsilon$-uniform misspecification (Definition \ref{['def:unif_mis']}) where $\epsilon=0.7$. The gray region shows the uniformly misspecified function class. Note the vertical range of it is always $2\epsilon=1.4$ over the whole domain. (b) An example of $\rho$-gap-adjusted misspecification (Definition \ref{['def:gap_adj_mis']}) where $\rho=0.7$. The orange region shows the gap-adjusted misspecified function class. Note the vertical range at a certain point $x$ depends on the suboptimal gap. For example, the vertical range at $x=0$ is much larger than it at $x=1$ and there is no vertical range at $x_*=2$.
Figure 2: Examples of misspecification in 2 dimensions. The blue surface denotes the Branin function $f_0$. The optimal point $x_*$ is $(x_1=-5,x_2=0)$. (a) An example of $\epsilon$-uniform misspecification (Definition \ref{['def:unif_mis']}) where $\epsilon=100$. Two gray surfaces denote the upper and lower bound of misspecified function class. (b) An example of $\rho$-gap-adjusted misspecification (Definition \ref{['def:gap_adj_mis']}) where $\rho=0.7$. Two orange surfaces denote the upper and lower bound of misspecified function class. Note there is no misspecification at $x_*$.
Figure 3: (a): An example of $\rho$-gap-adjusted misspecification (Definition \ref{['def:lm']}) in $1$-dimension where $\rho=0.7$. The blue line shows a non-linear true function and the gray region shows the gap-adjusted misspecified function class. Note the vertical range of gray region at a certain point $x$ depends on the suboptimal gap. For example, at $x=1$ suboptimal gap is $2$ and the vertical range is $4\rho=2.8$. The red line shows a feasible linear function that is able to optimize the true function by taking $x_*=2$. (b): An example of weak $\rho$-gap-adjusted misspecification (Definition \ref{['def:lm_weak']}) in $1$-dimension where $\rho=0.7$. The difference to Figure \ref{['fig:example']} is that one can shift the qualifying approximation arbitrarily up or down and the specified model only has to $\rho$-RAM approximate $f_0$ up to an additive constant factor.

Theorems & Definitions (34)

Definition 1: $\epsilon$-Uniform Misspecification
Definition 2: $\rho$-Gap-Adjusted Misspecification ($\rho$-GAM)
Definition 3: $\rho$-Gap-Adjusted Misspecification
Proposition 4
Theorem 9
Remark 10
Lemma 11: Bound of Deviation
Lemma 12: Instantaneous Regret Bound
Lemma 13
Lemma 14
...and 24 more

No-Regret Linear Bandits under Gap-Adjusted Misspecification

TL;DR

Abstract

No-Regret Linear Bandits under Gap-Adjusted Misspecification

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (34)