One Good Source is All You Need: Near-Optimal Regret for Bandits under Heterogeneous Noise

Aadirupa Saha; Amith Bhat; Haipeng Luo

One Good Source is All You Need: Near-Optimal Regret for Bandits under Heterogeneous Noise

Aadirupa Saha, Amith Bhat, Haipeng Luo

TL;DR

This work addresses online learning in a multi-armed bandit setting with multiple heterogeneous data sources, where each source has unknown noise variance. The authors introduce SOAR, a two-stage algorithm that first prunes high-variance sources via variance concentration bounds and then performs an adaptive min-max LCB-UCB exploration to jointly identify the best arm and the lowest-variance data source. They prove near-oracle regret bounds: an instance-dependent rate of $\tilde{O}\left({\sigma^*}^2\sum_{i=2}^K \frac{\log T}{\Delta_i} + \sqrt{K \sum_{j=1}^M \sigma_j^2}\right)$, with ${\sigma^*}^2$ the minimum source variance, matching the single-source oracle up to logarithmic factors, plus an additive $\tilde{O}(\sqrt{K \sum_j \sigma_j^2})$ cost for source identification. The results improve upon natural baselines that scale with $\sigma_{\max}^2$ or incur costly variance-based distinctions when variances are similar. Empirical results on synthetic data and the MovieLens 25M dataset demonstrate SOAR’s superior performance and its ability to quickly focus on low-variance sources while maintaining strong reward identification.

Abstract

We study $K$-armed Multiarmed Bandit (MAB) problem with $M$ heterogeneous data sources, each exhibiting unknown and distinct noise variances $\{σ_j^2\}_{j=1}^M$. The learner's objective is standard MAB regret minimization, with the additional complexity of adaptively selecting which data source to query from at each round. We propose Source-Optimistic Adaptive Regret minimization (SOAR), a novel algorithm that quickly prunes high-variance sources using sharp variance-concentration bounds, followed by a `balanced min-max LCB-UCB approach' that seamlessly integrates the parallel tasks of identifying the best arm and the optimal (minimum-variance) data source. Our analysis shows SOAR achieves an instance-dependent regret bound of $\tilde{O}\left({σ^*}^2\sum_{i=2}^K \frac{\log T}{Δ_i} + \sqrt{K \sum_{j=1}^M σ_j^2}\right)$, up to preprocessing costs depending only on problem parameters, where ${σ^*}^2 := \min_j σ_j^2$ is the minimum source variance and $Δ_i$ denotes the suboptimality gap of the $i$-th arm. This result is both surprising as despite lacking prior knowledge of the minimum-variance source among $M$ alternatives, SOAR attains the optimal instance-dependent regret of standard single-source MAB with variance ${σ^*}^2$, while incurring only an small (and unavoidable) additive cost of $\tilde O(\sqrt{K \sum_{j=1}^M σ_j^2})$ towards the optimal (minimum variance) source identification. Our theoretical bounds represent a significant improvement over some proposed baselines, e.g. Uniform UCB or Explore-then-Commit UCB, which could potentially suffer regret scaling with $σ_{\max}^2$ in place of ${σ^*}^2$-a gap that can be arbitrarily large when $σ_{\max} \gg σ^*$. Experiments on multiple synthetic problem instances and the real-world MovieLens\;25M dataset, demonstrating the superior performance of SOAR over the baselines.

One Good Source is All You Need: Near-Optimal Regret for Bandits under Heterogeneous Noise

TL;DR

, with

the minimum source variance, matching the single-source oracle up to logarithmic factors, plus an additive

cost for source identification. The results improve upon natural baselines that scale with

or incur costly variance-based distinctions when variances are similar. Empirical results on synthetic data and the MovieLens 25M dataset demonstrate SOAR’s superior performance and its ability to quickly focus on low-variance sources while maintaining strong reward identification.

Abstract

We study

-armed Multiarmed Bandit (MAB) problem with

heterogeneous data sources, each exhibiting unknown and distinct noise variances

. The learner's objective is standard MAB regret minimization, with the additional complexity of adaptively selecting which data source to query from at each round. We propose Source-Optimistic Adaptive Regret minimization (SOAR), a novel algorithm that quickly prunes high-variance sources using sharp variance-concentration bounds, followed by a `balanced min-max LCB-UCB approach' that seamlessly integrates the parallel tasks of identifying the best arm and the optimal (minimum-variance) data source. Our analysis shows SOAR achieves an instance-dependent regret bound of

, up to preprocessing costs depending only on problem parameters, where

is the minimum source variance and

denotes the suboptimality gap of the

-th arm. This result is both surprising as despite lacking prior knowledge of the minimum-variance source among

alternatives, SOAR attains the optimal instance-dependent regret of standard single-source MAB with variance

, while incurring only an small (and unavoidable) additive cost of

towards the optimal (minimum variance) source identification. Our theoretical bounds represent a significant improvement over some proposed baselines, e.g. Uniform UCB or Explore-then-Commit UCB, which could potentially suffer regret scaling with

in place of

-a gap that can be arbitrarily large when

. Experiments on multiple synthetic problem instances and the real-world MovieLens\;25M dataset, demonstrating the superior performance of SOAR over the baselines.

Paper Structure (82 sections, 21 theorems, 117 equations, 12 figures, 1 table, 2 algorithms)

This paper contains 82 sections, 21 theorems, 117 equations, 12 figures, 1 table, 2 algorithms.

Introduction
Related Work.
Baselines.
Unresolved Questions.
Overview.
Problem Statement (Informal).
Our contributions.
Problem Setting
Impact of Source Selection on Regret.
Warm-Up: Parameter Estimation and Concentration
Notation.
A brief description of SOAR.
Preprocess
Parameter Estimation.
Confidence Bounds.
...and 67 more sections

Key Result

Lemma 3.0

Fix $\delta \in (0,1)$ and a sampling budget $\tau_p \in \mathbb{N}$. Assume $\epsilon" < \min\!\left\{ 6 \sigma_j^2,\; \tfrac{18\sigma_j^4}{\bar{\eta}^2} \right\}$ for each source $j \in [M]$, where $\epsilon"$ is the parameter appearing in Bernstein's inequality blm13MATH281C_Lecture4 . Then, with

Figures (12)

Figure 1: Regret of SOAR with varying number of arms $K \in \{5,15,30\}$
Figure 2: Regret of SOAR with varying number of sources $M \in \{5,15,30\}$
Figure 3: SOAR vs Baseline-1: WC-1
Figure 4: SOAR vs Baseline-2: WC-2
Figure 5: SOAR vs. Baseline-1 on MovieLens.
...and 7 more figures

Theorems & Definitions (40)

Lemma 3.0: Variance Concentration
proof
Remark 3.1
Lemma 3.1: Source Variance Concentration
proof
Corollary 3.1: Variance Sandwiching
proof
Lemma 3.1: Mean Reward Concentration
proof
Theorem 4.1: Stopping Condition of
...and 30 more

One Good Source is All You Need: Near-Optimal Regret for Bandits under Heterogeneous Noise

TL;DR

Abstract

One Good Source is All You Need: Near-Optimal Regret for Bandits under Heterogeneous Noise

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (40)