Sparsity-Agnostic Linear Bandits with Adaptive Adversaries

Tianyuan Jin; Kyoungseok Jang; Nicolò Cesa-Bianchi

Sparsity-Agnostic Linear Bandits with Adaptive Adversaries

Tianyuan Jin, Kyoungseok Jang, Nicolò Cesa-Bianchi

TL;DR

This work addresses sparsity-agnostic stochastic linear bandits under adaptive adversaries by introducing SparseLinUCB, a multi-level confidence-set algorithm that achieves $\tilde{O}(S\sqrt{dT})$ regret without prior knowledge of the sparsity level or strong assumptions on the action sets. It leverages online-to-confidence-set conversions with a hierarchy of radii and a base sparse online learner (SeqSEW) to obtain robust guarantees, plus an instance-dependent bound tying regret to the suboptimality gap $\Delta$. The paper also proposes AdaLinUCB, which uses Exp3 to adaptively weight confidence-set radii, achieving $\tilde{O}(\max\{\sqrt{dq},\sqrt{S/q}\}\sqrt{dT})$ and offering practical empirical improvements over OFUL. Through theoretical and empirical results, the work demonstrates robust, sparsity-aware learning in adversarial environments and highlights directions for tightening bounds and relaxing noise assumptions.

Abstract

We study stochastic linear bandits where, in each round, the learner receives a set of actions (i.e., feature vectors), from which it chooses an element and obtains a stochastic reward. The expected reward is a fixed but unknown linear function of the chosen action. We study sparse regret bounds, that depend on the number $S$ of non-zero coefficients in the linear reward function. Previous works focused on the case where $S$ is known, or the action sets satisfy additional assumptions. In this work, we obtain the first sparse regret bounds that hold when $S$ is unknown and the action sets are adversarially generated. Our techniques combine online to confidence set conversions with a novel randomized model selection approach over a hierarchy of nested confidence sets. When $S$ is known, our analysis recovers state-of-the-art bounds for adversarial action sets. We also show that a variant of our approach, using Exp3 to dynamically select the confidence sets, can be used to improve the empirical performance of stochastic linear bandits while enjoying a regret bound with optimal dependence on the time horizon.

Sparsity-Agnostic Linear Bandits with Adaptive Adversaries

TL;DR

This work addresses sparsity-agnostic stochastic linear bandits under adaptive adversaries by introducing SparseLinUCB, a multi-level confidence-set algorithm that achieves

regret without prior knowledge of the sparsity level or strong assumptions on the action sets. It leverages online-to-confidence-set conversions with a hierarchy of radii and a base sparse online learner (SeqSEW) to obtain robust guarantees, plus an instance-dependent bound tying regret to the suboptimality gap

. The paper also proposes AdaLinUCB, which uses Exp3 to adaptively weight confidence-set radii, achieving

and offering practical empirical improvements over OFUL. Through theoretical and empirical results, the work demonstrates robust, sparsity-aware learning in adversarial environments and highlights directions for tightening bounds and relaxing noise assumptions.

Abstract

of non-zero coefficients in the linear reward function. Previous works focused on the case where

is known, or the action sets satisfy additional assumptions. In this work, we obtain the first sparse regret bounds that hold when

is unknown and the action sets are adversarially generated. Our techniques combine online to confidence set conversions with a novel randomized model selection approach over a hierarchy of nested confidence sets. When

is known, our analysis recovers state-of-the-art bounds for adversarial action sets. We also show that a variant of our approach, using Exp3 to dynamically select the confidence sets, can be used to improve the empirical performance of stochastic linear bandits while enjoying a regret bound with optimal dependence on the time horizon.

Paper Structure (16 sections, 17 theorems, 94 equations, 1 figure, 2 tables, 2 algorithms)

This paper contains 16 sections, 17 theorems, 94 equations, 1 figure, 2 tables, 2 algorithms.

Introduction
Additional related work
Problem definition
Online to confidence set conversions
A multi-level sparse linear bandit algorithm
Adaptive model selection for stochastic linear bandits
Model selection experiments
Limitations and open problems
Notation
Analysis of SparseLinUCB
Analysis of AdaLinUCB
Supporting lemmas
Experimental details
Settings common to all algorithms
AdaLinUCB details
...and 1 more sections

Key Result

Lemma 2.1

Let $\delta\in(0,1/4]$ and $\|\theta_*\|_2\leq 1$. Assume a sequence $\{(A_t,X_t)\}_{t\in [T]}$, where $X_t$ satisfies eq:linmodel for all $t \in [T]$, is fed to an online linear regression algorithm $\mathcal{B}$ generating predictions $\{\widehat{X}_t\}_{t\in [T]}$. Then $\mathbb{P}(\exists t\in [ and $B_T$ is an upper bound on the regret $\rho_T(\theta_*)$ of $\mathcal{B}$.

Figures (1)

Figure 1: Experimental results with different sparsity levels $S\in\{1,2,4,8,16\}$. In each plot, the $X$-axis are time steps in $[1,10^4]$ and the $Y$-axis is cumulative regret. AL stands for $\texttt{\upshape AdaLinUCB}$ and SL stands for $\texttt{\upshape SparseLinUCB}$.

Theorems & Definitions (18)

Lemma 2.1: abbasi2012online
Lemma 2.2: lattimore2018bandit
Lemma 3.0
Theorem 3.1
Corollary 3.2
Theorem 3.3
Corollary 3.4
Remark 3.5
Theorem 4.1
Theorem B.1
...and 8 more

Sparsity-Agnostic Linear Bandits with Adaptive Adversaries

TL;DR

Abstract

Sparsity-Agnostic Linear Bandits with Adaptive Adversaries

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (18)