Table of Contents
Fetching ...

Logarithmic Regret for Unconstrained Submodular Maximization Stochastic Bandit

Julien Zhou, Pierre Gaillard, Thibaud Rahier, Julyan Arbel

TL;DR

The paper tackles online unconstrained submodular maximization under stochastic bandit feedback. It introduces DG-ETC, a Double-Greedy based Explore-Then-Commit strategy that adaptively allocates exploration per item to counteract noise, achieving a logarithmic, hardness-dependent bound on the $1/2$-approximate pseudo-regret and a complementary worst-case bound of $O(dT^{2/3}\log(dT)^{1/3})$. A new problem-dependent hardness measure, $H_f$, governs the regret behavior and enables a nuanced mixture of logarithmic and $T^{2/3}$ terms in the bounds. The results hold with high probability and in expectation, showing that exploiting the looseness of the offline $1/2$-approximation ratio in non-adversarial settings can yield strong online guarantees. This work advances online submodular optimization under bandit feedback and introduces techniques potentially applicable to other approximate maximization problems.

Abstract

We address the online unconstrained submodular maximization problem (Online USM), in a setting with stochastic bandit feedback. In this framework, a decision-maker receives noisy rewards from a non monotone submodular function taking values in a known bounded interval. This paper proposes Double-Greedy - Explore-then-Commit (DG-ETC), adapting the Double-Greedy approach from the offline and online full-information settings. DG-ETC satisfies a $O(d\log(dT))$ problem-dependent upper bound for the $1/2$-approximate pseudo-regret, as well as a $O(dT^{2/3}\log(dT)^{1/3})$ problem-free one at the same time, outperforming existing approaches. In particular, we introduce a problem-dependent notion of hardness characterizing the transition between logarithmic and polynomial regime for the upper bounds.

Logarithmic Regret for Unconstrained Submodular Maximization Stochastic Bandit

TL;DR

The paper tackles online unconstrained submodular maximization under stochastic bandit feedback. It introduces DG-ETC, a Double-Greedy based Explore-Then-Commit strategy that adaptively allocates exploration per item to counteract noise, achieving a logarithmic, hardness-dependent bound on the -approximate pseudo-regret and a complementary worst-case bound of . A new problem-dependent hardness measure, , governs the regret behavior and enables a nuanced mixture of logarithmic and terms in the bounds. The results hold with high probability and in expectation, showing that exploiting the looseness of the offline -approximation ratio in non-adversarial settings can yield strong online guarantees. This work advances online submodular optimization under bandit feedback and introduces techniques potentially applicable to other approximate maximization problems.

Abstract

We address the online unconstrained submodular maximization problem (Online USM), in a setting with stochastic bandit feedback. In this framework, a decision-maker receives noisy rewards from a non monotone submodular function taking values in a known bounded interval. This paper proposes Double-Greedy - Explore-then-Commit (DG-ETC), adapting the Double-Greedy approach from the offline and online full-information settings. DG-ETC satisfies a problem-dependent upper bound for the -approximate pseudo-regret, as well as a problem-free one at the same time, outperforming existing approaches. In particular, we introduce a problem-dependent notion of hardness characterizing the transition between logarithmic and polynomial regime for the upper bounds.

Paper Structure

This paper contains 41 sections, 11 theorems, 46 equations, 4 figures, 1 table, 4 algorithms.

Key Result

Theorem 1

Let $\mathcal{D}$ be a finite set. Algorithm DG returns a set $S$ such that

Figures (4)

  • Figure 1: Example of sampling from DG-Sp, for $i=d+1$ and $(K_{j})_{j\in[d]}=(1, 0, 1, 0, \dots 1)$ .
  • Figure 2: $h_{f,i}$ as a function of $\alpha_f(i,X)$ and $\beta_f(i,X)$ for $c=1$ .
  • Figure 3: Exploration thresholds for Subroutine UpdExp as a function of $\bar{\alpha}_i$ and $\bar{\beta}_i$ for $c=1$.
  • Figure : Double-Greedy (DG from Buchbinder2012)

Theorems & Definitions (23)

  • Definition 0: Submodularity
  • Theorem 1: Buchbinder2012, Theorem I.2.
  • Proposition 0
  • Remark 1
  • Remark 2
  • Definition 1: DG-hardness
  • Remark 3
  • Example
  • Theorem 2
  • Remark 4
  • ...and 13 more