Matroid Semi-Bandits in Sublinear Time

Ruo-Chun Tzeng; Naoto Ohsaka; Kaito Ariu

Matroid Semi-Bandits in Sublinear Time

Ruo-Chun Tzeng, Naoto Ohsaka, Kaito Ariu

TL;DR

This work introduces FasterCUCB, the first matroid semi-bandit algorithm with per-round time sublinear in the number of arms $K$, addressing a key computational bottleneck in large-scale settings. The core idea combines a dynamic maximum-weight-base maintenance routine under inner-product weights with a two-pronged strategy: feature rounding to limit distinct weights and a minimum hitting set over line-arrangement cells to cover multiple queries efficiently. The algorithm achieves sublinear per-round computation for common matroids (uniform, partition, graphical) and near-sublinear time for transversal matroids, while preserving regret guarantees that asymptotically match the gap-dependent lower bound of Kveton et al. (2014). This yields a practically scalable approach to combinatorial bandits with matroid constraints, enabling efficient learning in large action spaces. The results pave the way for extending sublinear-time techniques to related bandit settings and for exploring alternative weight representations in optimistic learning frameworks.

Abstract

We study the matroid semi-bandits problem, where at each round the learner plays a subset of $K$ arms from a feasible set, and the goal is to maximize the expected cumulative linear rewards. Existing algorithms have per-round time complexity at least $Ω(K)$, which becomes expensive when $K$ is large. To address this computational issue, we propose FasterCUCB whose sampling rule takes time sublinear in $K$ for common classes of matroids: $O(D\text{ polylog}(K)\text{ polylog}(T))$ for uniform matroids, partition matroids, and graphical matroids, and $O(D\sqrt{K}\text{ polylog}(T))$ for transversal matroids. Here, $D$ is the maximum number of elements in any feasible subset of arms, and $T$ is the horizon. Our technique is based on dynamic maintenance of an approximate maximum-weight basis over inner-product weights. Although the introduction of an approximate maximum-weight basis presents a challenge in regret analysis, we can still guarantee an upper bound on regret as tight as CUCB in the sense that it matches the gap-dependent lower bound by Kveton et al. (2014a) asymptotically.

Matroid Semi-Bandits in Sublinear Time

TL;DR

This work introduces FasterCUCB, the first matroid semi-bandit algorithm with per-round time sublinear in the number of arms

, addressing a key computational bottleneck in large-scale settings. The core idea combines a dynamic maximum-weight-base maintenance routine under inner-product weights with a two-pronged strategy: feature rounding to limit distinct weights and a minimum hitting set over line-arrangement cells to cover multiple queries efficiently. The algorithm achieves sublinear per-round computation for common matroids (uniform, partition, graphical) and near-sublinear time for transversal matroids, while preserving regret guarantees that asymptotically match the gap-dependent lower bound of Kveton et al. (2014). This yields a practically scalable approach to combinatorial bandits with matroid constraints, enabling efficient learning in large action spaces. The results pave the way for extending sublinear-time techniques to related bandit settings and for exploring alternative weight representations in optimistic learning frameworks.

Abstract

We study the matroid semi-bandits problem, where at each round the learner plays a subset of

arms from a feasible set, and the goal is to maximize the expected cumulative linear rewards. Existing algorithms have per-round time complexity at least

, which becomes expensive when

is large. To address this computational issue, we propose FasterCUCB whose sampling rule takes time sublinear in

for common classes of matroids:

for uniform matroids, partition matroids, and graphical matroids, and

for transversal matroids. Here,

is the maximum number of elements in any feasible subset of arms, and

is the horizon. Our technique is based on dynamic maintenance of an approximate maximum-weight basis over inner-product weights. Although the introduction of an approximate maximum-weight basis presents a challenge in regret analysis, we can still guarantee an upper bound on regret as tight as CUCB in the sense that it matches the gap-dependent lower bound by Kveton et al. (2014a) asymptotically.

Paper Structure (40 sections, 13 theorems, 71 equations, 2 figures, 2 tables, 6 algorithms)

This paper contains 40 sections, 13 theorems, 71 equations, 2 figures, 2 tables, 6 algorithms.

Introduction
Preliminaries
Matroid.
Matroid semi-bandits.
Common classes of matroids.
CUCB.
Related Works
Semi-bandits and sublinear-time bandits.
Dynamic maintenance of maximum-weight base of a matroid
Dynamic Maintenance of Maximum-weight Base over Inner Product Weight
Problem Setting and Technical Result
Rounding Arm Features
Handling Multiple Queries
From weighting to permutation.
Characterizing representable permutations.
...and 25 more sections

Key Result

Theorem 4.4

There exist implementations of Initialize, Find-Base, and Update-Feature such that the following are satisfied: Find-Base always returns a $(1+\epsilon)$-approximate maximum-weight base of a matroid $\mathcal{M}\xspace$ with arm $k$'s weight defined as $\langle \boldsymbol{f}\xspace_k, \boldsymbol{q

Figures (2)

Figure 1: Illustration of feature rounding. There are $|\mathbb{W}|^2$ bins, and features are assumed not to be in (the interior of) the shaded area. Each feature $\boldsymbol{f}\xspace_k$ is rounded to its dominating point $\mathrm{dom}\xspace(\boldsymbol{f}\xspace_k)$, which is specified by a curved arrow.
Figure 2: Illustration of characterization of representable permutations. There are three features $\boldsymbol{f}\xspace_1, \boldsymbol{f}\xspace_2, \boldsymbol{f}\xspace_3$ on $\mathbb{R}\xspace^2$. Each dashed line denotes $\overleftrightarrow{\boldsymbol{f}\xspace_{i} \boldsymbol{f}\xspace_{j}}$ for some $i \neq j$; each black bold line is orthogonal to some dashed line and intersects the origin. Such black bold lines generate six regions, each corresponding to a distinct permutation. For example, for any query $\boldsymbol{q}\xspace$ in the hatched area, it holds that $\langle \boldsymbol{f}\xspace_1, \boldsymbol{q}\xspace \rangle > \langle \boldsymbol{f}\xspace_2, \boldsymbol{q}\xspace \rangle > \langle \boldsymbol{f}\xspace_3, \boldsymbol{q}\xspace \rangle$; i.e., $\boldsymbol{q}\xspace$ represents a permutation $\pi$ such that $(\pi(1), \pi(2), \pi(3)) = (1,2,3)$.

Theorems & Definitions (20)

Remark 4.3
Theorem 4.4: $*$
Remark 4.5
Lemma 4.6: $*$
Lemma 4.7: $*$
Lemma 4.8: $*$
Lemma 4.9: $*$
Corollary 4.10: $*$
Theorem 5.1
Lemma 5.2
...and 10 more

Matroid Semi-Bandits in Sublinear Time

TL;DR

Abstract

Matroid Semi-Bandits in Sublinear Time

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (20)