Nearly Minimax Optimal Submodular Maximization with Bandit Feedback

Artin Tajdini; Lalit Jain; Kevin Jamieson

Nearly Minimax Optimal Submodular Maximization with Bandit Feedback

Artin Tajdini, Lalit Jain, Kevin Jamieson

TL;DR

We study maximizing an unknown monotone submodular function $f$ under a cardinality constraint with bandit feedback. The paper develops minimax lower bounds on robust greedy regret $R_{gr}$ and introduces Sub-UCB, an algorithm that interpolates between greedy exploration and full-set UCB. The main results show $R_{gr} = \tilde{\Omega}(\min_{0\le L\le k}(L^{1/3}n^{1/3}T^{2/3} + \sqrt{{n \choose k - L}T}))$ and $R_{gr} = \tilde{\mathcal{O}}(\min_{L\le k}(L n^{1/3}T^{2/3} + \sqrt{{n \choose k - L}T}))$ for Sub-UCB, establishing minimax optimality up to logarithmic factors. This work provides the first tight results for submodular bandits with bandit feedback and offers practical guidance on balancing partial greedy growth with exploration to achieve near-optimal performance.

Abstract

We consider maximizing an unknown monotonic, submodular set function $f: 2^{[n]} \rightarrow [0,1]$ with cardinality constraint under stochastic bandit feedback. At each time $t=1,\dots,T$ the learner chooses a set $S_t \subset [n]$ with $|S_t| \leq k$ and receives reward $f(S_t) + η_t$ where $η_t$ is mean-zero sub-Gaussian noise. The objective is to minimize the learner's regret with respect to an approximation of the maximum $f(S_*)$ with $|S_*| = k$, obtained through robust greedy maximization of $f$. To date, the best regret bound in the literature scales as $k n^{1/3} T^{2/3}$. And by trivially treating every set as a unique arm one deduces that $\sqrt{ {n \choose k} T }$ is also achievable using standard multi-armed bandit algorithms. In this work, we establish the first minimax lower bound for this setting that scales like $\tildeΩ(\min_{L \le k}(L^{1/3}n^{1/3}T^{2/3} + \sqrt{{n \choose k - L}T}))$. For a slightly restricted algorithm class, we prove a stronger regret lower bound of $\tildeΩ(\min_{L \le k}(Ln^{1/3}T^{2/3} + \sqrt{{n \choose k - L}T}))$. Moreover, we propose an algorithm Sub-UCB that achieves regret $\tilde{\mathcal{O}}(\min_{L \le k}(Ln^{1/3}T^{2/3} + \sqrt{{n \choose k - L}T}))$ capable of matching the lower bound on regret for the restricted class up to logarithmic factors.

Nearly Minimax Optimal Submodular Maximization with Bandit Feedback

TL;DR

We study maximizing an unknown monotone submodular function

under a cardinality constraint with bandit feedback. The paper develops minimax lower bounds on robust greedy regret

and introduces Sub-UCB, an algorithm that interpolates between greedy exploration and full-set UCB. The main results show

and

for Sub-UCB, establishing minimax optimality up to logarithmic factors. This work provides the first tight results for submodular bandits with bandit feedback and offers practical guidance on balancing partial greedy growth with exploration to achieve near-optimal performance.

Abstract

We consider maximizing an unknown monotonic, submodular set function

with cardinality constraint under stochastic bandit feedback. At each time

the learner chooses a set

with

and receives reward

where

is mean-zero sub-Gaussian noise. The objective is to minimize the learner's regret with respect to an approximation of the maximum

with

, obtained through robust greedy maximization of

. To date, the best regret bound in the literature scales as

. And by trivially treating every set as a unique arm one deduces that

is also achievable using standard multi-armed bandit algorithms. In this work, we establish the first minimax lower bound for this setting that scales like

. For a slightly restricted algorithm class, we prove a stronger regret lower bound of

. Moreover, we propose an algorithm Sub-UCB that achieves regret

capable of matching the lower bound on regret for the restricted class up to logarithmic factors.

Paper Structure (14 sections, 10 theorems, 54 equations, 2 figures, 1 table, 1 algorithm)

This paper contains 14 sections, 10 theorems, 54 equations, 2 figures, 1 table, 1 algorithm.

INTRODUCTION
Problem Statement
Related Work
LOWER BOUND
Proof Sketch
UCB UPPER BOUND
Proof Sketch
EXPERIMENTS
CONCLUSION
Lowerbound proofs
Proof of Theorem \ref{['thm:main']}
Proof of Theorem \ref{['thm:naet-lower']}
Proof of Theorem \ref{['thm:regret']}
Auxiliary Lemmas

Key Result

Lemma 1.1

(Theorem 6 in streeter_online_2007) For any $\boldsymbol{\epsilon} \ge \mathbf{0} \in \mathbb{R}^k$, and $S^{k, \boldsymbol{\epsilon}}_\textbf{gr}\in \mathcal{S}^{k,\boldsymbol{\epsilon}}$, we have

Figures (2)

Figure 1: Regret comparison for weighted set cover with $n=15$ and $k = 4$
Figure 2: Comparison between all Sub-UCB greedy stop cardinality choices for the unique greedy path function with $n = 20$ and $k = 5$. The worst-case optimal stop cardinality $l = k - i^*$ is highlighted

Theorems & Definitions (17)

Lemma 1.1
Theorem 2.1
Lemma 2.2
proof
Theorem 2.3
Theorem 3.1
Lemma A.1
proof
Lemma A.2
proof
...and 7 more

Nearly Minimax Optimal Submodular Maximization with Bandit Feedback

TL;DR

Abstract

Nearly Minimax Optimal Submodular Maximization with Bandit Feedback

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (17)