Noise-Adaptive Confidence Sets for Linear Bandits and Application to Bayesian Optimization

Kwang-Sung Jun; Jungtaek Kim

Noise-Adaptive Confidence Sets for Linear Bandits and Application to Bayesian Optimization

Kwang-Sung Jun, Jungtaek Kim

TL;DR

This work addresses the challenge of learning under unknown noise in linear bandits by introducing two noise-adaptive strategies. Semi-adaptive LOSAN builds a weighted online ridge estimator with a confidence set whose width scales as $\tilde{O}(\sqrt{d\sigma_*^2 + \sigma_0^2})$, yielding improved regret bounds when the a priori noise bound is loose; fully adaptive LOFAV combines multiple base estimators into an intersection-of-ellipsoids to handle bounded noise, achieving variance-adaptive regret close to optimal and practical computation. The methods are empirically validated on synthetic tasks and Bayesian optimization benchmarks, where LOSAN and LOFAV show superior or comparable performance to OFUL and conventional Bayesian optimization, demonstrating practical variance adaptation in sequential decision-making. The proposed framework leverages regret equality from online learning to derive tight confidence sets and leads to practical algorithms that adapt to unknown noise while maintaining computational efficiency, with broad relevance to sequential optimization and BO applications.

Abstract

Adapting to a priori unknown noise level is a very important but challenging problem in sequential decision-making as efficient exploration typically requires knowledge of the noise level, which is often loosely specified. We report significant progress in addressing this issue for linear bandits in two respects. First, we propose a novel confidence set that is `semi-adaptive' to the unknown sub-Gaussian parameter $σ_*^2$ in the sense that the (normalized) confidence width scales with $\sqrt{dσ_*^2 + σ_0^2}$ where $d$ is the dimension and $σ_0^2$ is the specified sub-Gaussian parameter (known) that can be much larger than $σ_*^2$. This is a significant improvement over $\sqrt{dσ_0^2}$ of the standard confidence set of Abbasi-Yadkori et al. (2011), especially when $d$ is large or $σ_*^2=0$. We show that this leads to an improved regret bound in linear bandits. Second, for bounded rewards, we propose a novel variance-adaptive confidence set that has much improved numerical performance upon prior art. We then apply this confidence set to develop, as we claim, the first practical variance-adaptive linear bandit algorithm via an optimistic approach, which is enabled by our novel regret analysis technique. Both of our confidence sets rely critically on `regret equality' from online learning. Our empirical evaluation in diverse Bayesian optimization tasks shows that our proposed algorithms demonstrate better or comparable performance compared to existing methods.

Noise-Adaptive Confidence Sets for Linear Bandits and Application to Bayesian Optimization

TL;DR

, yielding improved regret bounds when the a priori noise bound is loose; fully adaptive LOFAV combines multiple base estimators into an intersection-of-ellipsoids to handle bounded noise, achieving variance-adaptive regret close to optimal and practical computation. The methods are empirically validated on synthetic tasks and Bayesian optimization benchmarks, where LOSAN and LOFAV show superior or comparable performance to OFUL and conventional Bayesian optimization, demonstrating practical variance adaptation in sequential decision-making. The proposed framework leverages regret equality from online learning to derive tight confidence sets and leads to practical algorithms that adapt to unknown noise while maintaining computational efficiency, with broad relevance to sequential optimization and BO applications.

Abstract

in the sense that the (normalized) confidence width scales with

where

is the dimension and

is the specified sub-Gaussian parameter (known) that can be much larger than

. This is a significant improvement over

of the standard confidence set of Abbasi-Yadkori et al. (2011), especially when

is large or

. We show that this leads to an improved regret bound in linear bandits. Second, for bounded rewards, we propose a novel variance-adaptive confidence set that has much improved numerical performance upon prior art. We then apply this confidence set to develop, as we claim, the first practical variance-adaptive linear bandit algorithm via an optimistic approach, which is enabled by our novel regret analysis technique. Both of our confidence sets rely critically on `regret equality' from online learning. Our empirical evaluation in diverse Bayesian optimization tasks shows that our proposed algorithms demonstrate better or comparable performance compared to existing methods.

Paper Structure (31 sections, 20 theorems, 120 equations, 7 figures, 2 algorithms)

This paper contains 31 sections, 20 theorems, 120 equations, 7 figures, 2 algorithms.

Introduction
Preliminaries.
Semi-Adaptation for Sub-Gaussian Noise
Proposed confidence set.
Full Adaptation to Bounded Noise
Proposed confidence set.
Proposed bandit algorithm.
Practical version.
Anytime version.
Experiments
Synthetic Experiments
Application to Bayesian Optimization
Benchmark functions.
NATS-Bench.
Related Work
...and 16 more sections

Key Result

Theorem 2.1

Take ass:semi. Then,

Figures (7)

Figure 1: The green line (very small) represents our confidence set $\mathcal{C}^{\normalfont\text{full}}_t$ and the orange area represents the confidence set of zhao23variance, which is also implemented as an intersection of $L$ confidence sets like ours. We use $n=$ 500,000 samples, $d=2$, $L=9$, and $\sigma_t^2=0.1,\forall t$. With $\theta^* = (1,0)$, the upper confidence bound on the mean reward of the arm $x=(1,0)$ is $1.01$ with our method while it is $7.66$ with their method and $1.05$ with SNCS.
Figure 2: Results of synthetic experiments with LOSAN and LOFAV. To fairly compare our algorithms to OFUL, we perform each experiment over 50 rounds where $S = 1.0$, $d = 32$, $|\mathcal{X}_t| = 128$, and $\sigma_0 \ \textrm{or} \ R = 1.0$.
Figure 3: Bayesian optimization results of LOSAN with random Fourier features and sub-Gaussian noises for four benchmark functions. We perform each experiment over 50 rounds where $S= 1.0$, $d = 128$, $|\mathcal{X}_t| = 512$, and $\sigma_0 = 1.0$.
Figure 4: Bayesian optimization results of LOFAV with random Fourier features and bounded noises for four benchmark functions. We perform each experiment over 50 rounds where $S= 1.0$, $d = 128$, $|\mathcal{X}_t| = 512$, and $R = 1.0$.
Figure 5: Bayesian optimization results of LOSAN and LOFAV with random Fourier features and sub-Gaussian or bounded noises for NATS-Bench. We perform each experiment over 50 rounds where $S= 1.0$, $d = 128$, $|\mathcal{X}_t| = 512$, and $\sigma_0 \ \textrm{or} \ R = 1.0$.
...and 2 more figures

Theorems & Definitions (34)

Theorem 2.1
Theorem 2.2
Theorem 3.1
Theorem 3.2
Proposition 1.1
proof
Lemma 1.2
proof
Theorem 2.1
proof
...and 24 more

Noise-Adaptive Confidence Sets for Linear Bandits and Application to Bayesian Optimization

TL;DR

Abstract

Noise-Adaptive Confidence Sets for Linear Bandits and Application to Bayesian Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (34)