Representative Action Selection for Large Action Space Bandit Families
Quan Zhou, Mark Kozdoba, Shie Mannor
TL;DR
The paper addresses the problem of efficiently learning across a family of bandits with a large shared action space by extracting a small, representative subset of actions. It introduces a simple sampling-based subset selection algorithm that builds a representative set without requiring explicit correlation knowledge, and provides regret bounds framed by $\epsilon$-nets and partitions of the action space. The analysis leverages Gaussian process (and RKHS) modeling of rewards and develops both geometric and measure-theoretic net concepts to bound regret, with the key sampling-correction term decaying exponentially in the sample budget $K$. Empirically, the method outperforms standard baselines such as Thompson Sampling and CUCB, demonstrates robustness to varying correlation structures, and remains scalable by avoiding exhaustive inner optimization over the full action space. The work offers a practical route to reducing exploration and computation in large action-space bandits while retaining performance across a family of related tasks.
Abstract
We study the problem of selecting a subset from a large action space shared by a family of bandits, with the goal of achieving performance nearly matching that of using the full action space. Indeed, in many natural situations, while the nominal set of actions may be large, there also exist significant correlations between the rewards of different actions. In this paper we propose an algorithm that can significantly reduce the action space when such correlations are present, without the need to a-priori know the correlation structure. We provide theoretical guarantees on the performance of the algorithm and demonstrate its practical effectiveness through empirical comparisons with Thompson Sampling and Upper Confidence Bound methods.
