Table of Contents
Fetching ...

One Good Source is All You Need: Near-Optimal Regret for Bandits under Heterogeneous Noise

Aadirupa Saha, Amith Bhat, Haipeng Luo

TL;DR

This work addresses online learning in a multi-armed bandit setting with multiple heterogeneous data sources, where each source has unknown noise variance. The authors introduce SOAR, a two-stage algorithm that first prunes high-variance sources via variance concentration bounds and then performs an adaptive min-max LCB-UCB exploration to jointly identify the best arm and the lowest-variance data source. They prove near-oracle regret bounds: an instance-dependent rate of $\tilde{O}\left({\sigma^*}^2\sum_{i=2}^K \frac{\log T}{\Delta_i} + \sqrt{K \sum_{j=1}^M \sigma_j^2}\right)$, with ${\sigma^*}^2$ the minimum source variance, matching the single-source oracle up to logarithmic factors, plus an additive $\tilde{O}(\sqrt{K \sum_j \sigma_j^2})$ cost for source identification. The results improve upon natural baselines that scale with $\sigma_{\max}^2$ or incur costly variance-based distinctions when variances are similar. Empirical results on synthetic data and the MovieLens 25M dataset demonstrate SOAR’s superior performance and its ability to quickly focus on low-variance sources while maintaining strong reward identification.

Abstract

We study $K$-armed Multiarmed Bandit (MAB) problem with $M$ heterogeneous data sources, each exhibiting unknown and distinct noise variances $\{σ_j^2\}_{j=1}^M$. The learner's objective is standard MAB regret minimization, with the additional complexity of adaptively selecting which data source to query from at each round. We propose Source-Optimistic Adaptive Regret minimization (SOAR), a novel algorithm that quickly prunes high-variance sources using sharp variance-concentration bounds, followed by a `balanced min-max LCB-UCB approach' that seamlessly integrates the parallel tasks of identifying the best arm and the optimal (minimum-variance) data source. Our analysis shows SOAR achieves an instance-dependent regret bound of $\tilde{O}\left({σ^*}^2\sum_{i=2}^K \frac{\log T}{Δ_i} + \sqrt{K \sum_{j=1}^M σ_j^2}\right)$, up to preprocessing costs depending only on problem parameters, where ${σ^*}^2 := \min_j σ_j^2$ is the minimum source variance and $Δ_i$ denotes the suboptimality gap of the $i$-th arm. This result is both surprising as despite lacking prior knowledge of the minimum-variance source among $M$ alternatives, SOAR attains the optimal instance-dependent regret of standard single-source MAB with variance ${σ^*}^2$, while incurring only an small (and unavoidable) additive cost of $\tilde O(\sqrt{K \sum_{j=1}^M σ_j^2})$ towards the optimal (minimum variance) source identification. Our theoretical bounds represent a significant improvement over some proposed baselines, e.g. Uniform UCB or Explore-then-Commit UCB, which could potentially suffer regret scaling with $σ_{\max}^2$ in place of ${σ^*}^2$-a gap that can be arbitrarily large when $σ_{\max} \gg σ^*$. Experiments on multiple synthetic problem instances and the real-world MovieLens\;25M dataset, demonstrating the superior performance of SOAR over the baselines.

One Good Source is All You Need: Near-Optimal Regret for Bandits under Heterogeneous Noise

TL;DR

This work addresses online learning in a multi-armed bandit setting with multiple heterogeneous data sources, where each source has unknown noise variance. The authors introduce SOAR, a two-stage algorithm that first prunes high-variance sources via variance concentration bounds and then performs an adaptive min-max LCB-UCB exploration to jointly identify the best arm and the lowest-variance data source. They prove near-oracle regret bounds: an instance-dependent rate of , with the minimum source variance, matching the single-source oracle up to logarithmic factors, plus an additive cost for source identification. The results improve upon natural baselines that scale with or incur costly variance-based distinctions when variances are similar. Empirical results on synthetic data and the MovieLens 25M dataset demonstrate SOAR’s superior performance and its ability to quickly focus on low-variance sources while maintaining strong reward identification.

Abstract

We study -armed Multiarmed Bandit (MAB) problem with heterogeneous data sources, each exhibiting unknown and distinct noise variances . The learner's objective is standard MAB regret minimization, with the additional complexity of adaptively selecting which data source to query from at each round. We propose Source-Optimistic Adaptive Regret minimization (SOAR), a novel algorithm that quickly prunes high-variance sources using sharp variance-concentration bounds, followed by a `balanced min-max LCB-UCB approach' that seamlessly integrates the parallel tasks of identifying the best arm and the optimal (minimum-variance) data source. Our analysis shows SOAR achieves an instance-dependent regret bound of , up to preprocessing costs depending only on problem parameters, where is the minimum source variance and denotes the suboptimality gap of the -th arm. This result is both surprising as despite lacking prior knowledge of the minimum-variance source among alternatives, SOAR attains the optimal instance-dependent regret of standard single-source MAB with variance , while incurring only an small (and unavoidable) additive cost of towards the optimal (minimum variance) source identification. Our theoretical bounds represent a significant improvement over some proposed baselines, e.g. Uniform UCB or Explore-then-Commit UCB, which could potentially suffer regret scaling with in place of -a gap that can be arbitrarily large when . Experiments on multiple synthetic problem instances and the real-world MovieLens\;25M dataset, demonstrating the superior performance of SOAR over the baselines.
Paper Structure (82 sections, 21 theorems, 117 equations, 12 figures, 1 table, 2 algorithms)

This paper contains 82 sections, 21 theorems, 117 equations, 12 figures, 1 table, 2 algorithms.

Key Result

Lemma 3.0

Fix $\delta \in (0,1)$ and a sampling budget $\tau_p \in \mathbb{N}$. Assume $\epsilon" < \min\!\left\{ 6 \sigma_j^2,\; \tfrac{18\sigma_j^4}{\bar{\eta}^2} \right\}$ for each source $j \in [M]$, where $\epsilon"$ is the parameter appearing in Bernstein's inequality blm13MATH281C_Lecture4 . Then, with

Figures (12)

  • Figure 1: Regret of SOAR with varying number of arms $K \in \{5,15,30\}$
  • Figure 2: Regret of SOAR with varying number of sources $M \in \{5,15,30\}$
  • Figure 3: SOAR vs Baseline-1: WC-1
  • Figure 4: SOAR vs Baseline-2: WC-2
  • Figure 5: SOAR vs. Baseline-1 on MovieLens.
  • ...and 7 more figures

Theorems & Definitions (40)

  • Lemma 3.0: Variance Concentration
  • proof
  • Remark 3.1
  • Lemma 3.1: Source Variance Concentration
  • proof
  • Corollary 3.1: Variance Sandwiching
  • proof
  • Lemma 3.1: Mean Reward Concentration
  • proof
  • Theorem 4.1: Stopping Condition of
  • ...and 30 more