Matroid Semi-Bandits in Sublinear Time
Ruo-Chun Tzeng, Naoto Ohsaka, Kaito Ariu
TL;DR
This work introduces FasterCUCB, the first matroid semi-bandit algorithm with per-round time sublinear in the number of arms $K$, addressing a key computational bottleneck in large-scale settings. The core idea combines a dynamic maximum-weight-base maintenance routine under inner-product weights with a two-pronged strategy: feature rounding to limit distinct weights and a minimum hitting set over line-arrangement cells to cover multiple queries efficiently. The algorithm achieves sublinear per-round computation for common matroids (uniform, partition, graphical) and near-sublinear time for transversal matroids, while preserving regret guarantees that asymptotically match the gap-dependent lower bound of Kveton et al. (2014). This yields a practically scalable approach to combinatorial bandits with matroid constraints, enabling efficient learning in large action spaces. The results pave the way for extending sublinear-time techniques to related bandit settings and for exploring alternative weight representations in optimistic learning frameworks.
Abstract
We study the matroid semi-bandits problem, where at each round the learner plays a subset of $K$ arms from a feasible set, and the goal is to maximize the expected cumulative linear rewards. Existing algorithms have per-round time complexity at least $Ω(K)$, which becomes expensive when $K$ is large. To address this computational issue, we propose FasterCUCB whose sampling rule takes time sublinear in $K$ for common classes of matroids: $O(D\text{ polylog}(K)\text{ polylog}(T))$ for uniform matroids, partition matroids, and graphical matroids, and $O(D\sqrt{K}\text{ polylog}(T))$ for transversal matroids. Here, $D$ is the maximum number of elements in any feasible subset of arms, and $T$ is the horizon. Our technique is based on dynamic maintenance of an approximate maximum-weight basis over inner-product weights. Although the introduction of an approximate maximum-weight basis presents a challenge in regret analysis, we can still guarantee an upper bound on regret as tight as CUCB in the sense that it matches the gap-dependent lower bound by Kveton et al. (2014a) asymptotically.
