Table of Contents
Fetching ...

Lasso Bandit with Compatibility Condition on Optimal Arm

Harin Lee, Taehyun Hwang, Min-hwan Oh

TL;DR

This work addresses high-dimensional sparse linear contextual bandits by introducing a milder compatibility condition: regularity only on the optimal arm is sufficient to obtain poly-logarithmic regret in $d$ and $T$ under a margin condition. The authors propose FS-WLasso, a forced-sampling then weighted-Lasso algorithm, which updates the parameter estimate throughout the greedy phase and achieves regret bounds of the form $O(\mathrm{poly}\log dT)$. They show this remains valid under the proposed condition, and that stronger context-diversity assumptions in prior work imply their weaker condition, but not vice versa. Empirical results corroborate the theory, demonstrating superior performance even when the greedy diversity assumptions fail, and under unknown sparsity. This work thus broadens the applicability of Lasso-based bandit methods by admitting the weakest known context-regularity required for poly-log regret and offers practical reproducibility through provided code and experiments.

Abstract

We consider a stochastic sparse linear bandit problem where only a sparse subset of context features affects the expected reward function, i.e., the unknown reward parameter has a sparse structure. In the existing Lasso bandit literature, the compatibility conditions, together with additional diversity conditions on the context features are imposed to achieve regret bounds that only depend logarithmically on the ambient dimension $d$. In this paper, we demonstrate that even without the additional diversity assumptions, the \textit{compatibility condition on the optimal arm} is sufficient to derive a regret bound that depends logarithmically on $d$, and our assumption is strictly weaker than those used in the lasso bandit literature under the single-parameter setting. We propose an algorithm that adapts the forced-sampling technique and prove that the proposed algorithm achieves $O(\text{poly}\log dT)$ regret under the margin condition. To our knowledge, the proposed algorithm requires the weakest assumptions among Lasso bandit algorithms under the single-parameter setting that achieve $O(\text{poly}\log dT)$ regret. Through numerical experiments, we confirm the superior performance of our proposed algorithm.

Lasso Bandit with Compatibility Condition on Optimal Arm

TL;DR

This work addresses high-dimensional sparse linear contextual bandits by introducing a milder compatibility condition: regularity only on the optimal arm is sufficient to obtain poly-logarithmic regret in and under a margin condition. The authors propose FS-WLasso, a forced-sampling then weighted-Lasso algorithm, which updates the parameter estimate throughout the greedy phase and achieves regret bounds of the form . They show this remains valid under the proposed condition, and that stronger context-diversity assumptions in prior work imply their weaker condition, but not vice versa. Empirical results corroborate the theory, demonstrating superior performance even when the greedy diversity assumptions fail, and under unknown sparsity. This work thus broadens the applicability of Lasso-based bandit methods by admitting the weakest known context-regularity required for poly-log regret and offers practical reproducibility through provided code and experiments.

Abstract

We consider a stochastic sparse linear bandit problem where only a sparse subset of context features affects the expected reward function, i.e., the unknown reward parameter has a sparse structure. In the existing Lasso bandit literature, the compatibility conditions, together with additional diversity conditions on the context features are imposed to achieve regret bounds that only depend logarithmically on the ambient dimension . In this paper, we demonstrate that even without the additional diversity assumptions, the \textit{compatibility condition on the optimal arm} is sufficient to derive a regret bound that depends logarithmically on , and our assumption is strictly weaker than those used in the lasso bandit literature under the single-parameter setting. We propose an algorithm that adapts the forced-sampling technique and prove that the proposed algorithm achieves regret under the margin condition. To our knowledge, the proposed algorithm requires the weakest assumptions among Lasso bandit algorithms under the single-parameter setting that achieve regret. Through numerical experiments, we confirm the superior performance of our proposed algorithm.
Paper Structure (53 sections, 36 theorems, 218 equations, 4 figures, 1 table, 2 algorithms)

This paper contains 53 sections, 36 theorems, 218 equations, 4 figures, 1 table, 2 algorithms.

Key Result

Theorem 1

The compatibility condition on the optimal arm (Assumption assm:compatibility for optimal arm) is strictly weaker than the assumptions made in previous Lasso bandit works under the single-parameter setting oh2021sparsityli2021regretariu2022thresholdedchakraborty2023thompson, as illustrated in Figure

Figures (4)

  • Figure 1: Illustration of relationships among distributional assumptions on context used in the sparse linear contextual bandit literature. The blue arrows represent implication relationships while the red arrows represent infeasible implication relationships. The conditions written in blue with the check bullet ✓ in the figure imply the compatibility on the optimal arm (Assumption \ref{['assm:compatibility for optimal arm']}), serving as sufficient conditions, while the conditions written in orange indicate additional assumptions necessary to achieve the existing methods' regret guarantees, but not needed in our analysis. The case where all sub-optimal arms are fixed serves as a counter-example for the infeasible implication relationships. We provide the proofs of the implication relationship in \ref{['appx:compatibility cond. for optimal arm']} which may be of independent interest.
  • Figure 2: The evaluations of Lasso bandit algorithms are presented. Figure \ref{['fig:subfig1']} shows results where all context feature vectors are sampled from a correlated Gaussian distribution. Figure \ref{['fig:subfig2']} shows results where the context feature vectors of sub-optimal arms are fixed throughout time, and only the feature vector of the optimal arm has randomness. We plot the mean and standard deviation of cumulative regret across 100 runs for each algorithm.
  • Figure 3: Illustration of the results of Lemma \ref{['lemma:multiple-param in single-param implies ours']} and Lemma \ref{['lma:multiple-param in multiple-param implies ours']}. Let $\mathcal{C}$ be a conversion mapping that converts a single-parameter bandit instance into a multiple-parameter one by $Kd$-dimensional context vector construction, $\mathcal{M}_1$ a set of multiple-parameter instances converted by $\mathcal{C}$ satisfying Assumption \ref{['assm:arm optimality']} and \ref{['assm:compatibility for multi-parameter']}, $\mathcal{M}_2$ a set of multiple-parameter instances converted by $\mathcal{C}$ satisfying Assumption \ref{['assm:compatibility for optimal feature']}, and ${\mathcal{S}}$ a set of single-parameter instances satisfying Assumption \ref{['assm:compatibility for optimal arm']}. By the definition $\mathcal{C}({\mathcal{S}})$ denotes the image of ${\mathcal{S}}$ under $\mathcal{C}$ which is the set of multiple-parameter instances converted from ${\mathcal{S}}$ by $\mathcal{C}$. Similarly, $\mathcal{C}^{-1}(\mathcal{M}_1)$ is the inverse image of $\mathcal{M}_1$ under $\mathcal{C}$ which is the set of single-parameter instances that map to a member of $\mathcal{M}_1$. By Lemma \ref{['lemma:multiple-param in single-param implies ours']}, we ensure that $\mathcal{C}^{-1}(\mathcal{M}_1) \subset {\mathcal{S}}$, which means that our compatibility condition on the optimal arm (Assumption \ref{['assm:compatibility for optimal arm']}) is weaker than those of bastani2020onlinewang2018minimax through the conversion mapping $\mathcal{C}$. On the other hand, Lemma \ref{['lma:multiple-param in multiple-param implies ours']} ensures that $\mathcal{M}_1 \subset \mathcal{M}_2$.
  • Figure 4: Evaluations of $\texttt{FS-WLasso}$ with various lengths of forced-sampling stage under the setting of Experiment 2

Theorems & Definitions (45)

  • Theorem 1
  • Remark 1
  • Remark 2
  • Remark 3
  • Definition 1: Compatibility constant ratio
  • Remark 4
  • Theorem 2: Regret Bound of $\texttt{FS-WLasso}$
  • Remark 5
  • Theorem 3
  • Definition 2: Greedy diversity
  • ...and 35 more