Geometry-Aware Approaches for Balancing Performance and Theoretical Guarantees in Linear Bandits
Yuwei Luo, Mohsen Bayati
TL;DR
This work addresses the gap between strong empirical performance and weak frequentist guarantees for linear bandit algorithms like Thompson sampling and Greedy. It develops a data-driven, geometry-aware framework (POFUL) that leverages the full $d$-dimensional confidence ellipsoid around the unknown parameter $\theta^*$ to derive practical regret bounds and to enable course-correction. By introducing a data-driven regret proxy and an adaptive meta-algorithm (TS-MR/Greedy-MR), the paper achieves minimax frequentist regret $\tilde{O}(d\sqrt{T})$ while preserving the empirical strengths of the base algorithms. Through synthetic and real-world experiments, the proposed approach demonstrates robust performance benefits and practical applicability, bridging theory and practice in linear bandits.
Abstract
This paper is motivated by recent research in the $d$-dimensional stochastic linear bandit literature, which has revealed an unsettling discrepancy: algorithms like Thompson sampling and Greedy demonstrate promising empirical performance, yet this contrasts with their pessimistic theoretical regret bounds. The challenge arises from the fact that while these algorithms may perform poorly in certain problem instances, they generally excel in typical instances. To address this, we propose a new data-driven technique that tracks the geometric properties of the uncertainty ellipsoid around the main problem parameter. This methodology enables us to formulate a data-driven frequentist regret bound, which incorporates the geometric information, for a broad class of base algorithms, including Greedy, OFUL, and Thompson sampling. This result allows us to identify and ``course-correct" problem instances in which the base algorithms perform poorly. The course-corrected algorithms achieve the minimax optimal regret of order $\tilde{\mathcal{O}}(d\sqrt{T})$ for a $T$-period decision-making scenario, effectively maintaining the desirable attributes of the base algorithms, including their empirical efficacy. We present simulation results to validate our findings using synthetic and real data.
