Near-Optimal Regret for Efficient Stochastic Combinatorial Semi-Bandits

Zichun Ye; Runqi Wang; Xutong Liu; Shuai Li

Near-Optimal Regret for Efficient Stochastic Combinatorial Semi-Bandits

Zichun Ye, Runqi Wang, Xutong Liu, Shuai Li

TL;DR

This work tackles stochastic combinatorial multi-armed bandits (CMAB) under semi-bandit and cascading feedback, addressing the tension between minimax-optimal regret and computational efficiency. The authors introduce CMOSS, a MOSS-inspired algorithm that uses optimistic per-arm estimates with a refined confidence radius to select feasible actions, thereby removing the $\log T$ term that burdens CUCB-type methods. CMOSS achieves instance-independent regret bounds of $O\big( (\log k)\sqrt{kmT}\big)$ for $k\le \frac{m}{2}$ and $O\big((m-k)\sqrt{\log k\log(m-k)T}\big)$ for $k>\frac{m}{2}$ under semi-bandit feedback, and extends to cascading feedback with a multiplicative factor of $1/p^*$. Empirical results on synthetic and real-world data show CMOSS consistently outperforms baselines in regret while maintaining competitive runtimes, validating its practical potential for large-scale CMAB tasks. The work also provides an extension to cascading feedback and suggests future directions to further tighten dependence on observation probabilities and integrate newer UCB techniques.

Abstract

The combinatorial multi-armed bandit (CMAB) is a cornerstone of sequential decision-making framework, dominated by two algorithmic families: UCB-based and adversarial methods such as follow the regularized leader (FTRL) and online mirror descent (OMD). However, prominent UCB-based approaches like CUCB suffer from additional regret factor $\log T$ that is detrimental over long horizons, while adversarial methods such as EXP3.M and HYBRID impose significant computational overhead. To resolve this trade-off, we introduce the Combinatorial Minimax Optimal Strategy in the Stochastic setting (CMOSS). CMOSS is a computationally efficient algorithm that achieves an instance-independent regret of $O\big( (\log k)\sqrt{kmT}\big )$ when $k\leq \frac{m}{2}$ and $O\big((m-k)\sqrt{\log k\log(m-k)T}\big )$ when $k>\frac{m}{2}$ under semi-bandit feedback, where $m$ is the number of arms and $k$ is the maximum cardinality of a feasible action. Crucially, this result eliminates the dependency on $\log T$ and matches the established lower bounds of $Ω\big(\sqrt{kmT}\big)$ when $k\leq \frac{m}{2}$ and $Ω\big((m-k)\sqrt{\log (\frac{m}{m-k}) T}\big)$ when $k>\frac{m}{2}$ up to logarithmic terms of $k$ and $m$. We then extend our analysis to show that CMOSS is also applicable to cascading feedback. Experiments on synthetic and real-world datasets validate that CMOSS consistently outperforms benchmark algorithms in both regret and runtime efficiency.

Near-Optimal Regret for Efficient Stochastic Combinatorial Semi-Bandits

TL;DR

term that burdens CUCB-type methods. CMOSS achieves instance-independent regret bounds of

for

and

for

under semi-bandit feedback, and extends to cascading feedback with a multiplicative factor of

. Empirical results on synthetic and real-world data show CMOSS consistently outperforms baselines in regret while maintaining competitive runtimes, validating its practical potential for large-scale CMAB tasks. The work also provides an extension to cascading feedback and suggests future directions to further tighten dependence on observation probabilities and integrate newer UCB techniques.

Abstract

that is detrimental over long horizons, while adversarial methods such as EXP3.M and HYBRID impose significant computational overhead. To resolve this trade-off, we introduce the Combinatorial Minimax Optimal Strategy in the Stochastic setting (CMOSS). CMOSS is a computationally efficient algorithm that achieves an instance-independent regret of

when

and

when

under semi-bandit feedback, where

is the number of arms and

is the maximum cardinality of a feasible action. Crucially, this result eliminates the dependency on

and matches the established lower bounds of

when

and

when

up to logarithmic terms of

and

. We then extend our analysis to show that CMOSS is also applicable to cascading feedback. Experiments on synthetic and real-world datasets validate that CMOSS consistently outperforms benchmark algorithms in both regret and runtime efficiency.

Near-Optimal Regret for Efficient Stochastic Combinatorial Semi-Bandits

TL;DR

Abstract

Near-Optimal Regret for Efficient Stochastic Combinatorial Semi-Bandits

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (18)