Logarithmic Regret for Unconstrained Submodular Maximization Stochastic Bandit

Julien Zhou; Pierre Gaillard; Thibaud Rahier; Julyan Arbel

Logarithmic Regret for Unconstrained Submodular Maximization Stochastic Bandit

Julien Zhou, Pierre Gaillard, Thibaud Rahier, Julyan Arbel

TL;DR

The paper tackles online unconstrained submodular maximization under stochastic bandit feedback. It introduces DG-ETC, a Double-Greedy based Explore-Then-Commit strategy that adaptively allocates exploration per item to counteract noise, achieving a logarithmic, hardness-dependent bound on the $1/2$-approximate pseudo-regret and a complementary worst-case bound of $O(dT^{2/3}\log(dT)^{1/3})$. A new problem-dependent hardness measure, $H_f$, governs the regret behavior and enables a nuanced mixture of logarithmic and $T^{2/3}$ terms in the bounds. The results hold with high probability and in expectation, showing that exploiting the looseness of the offline $1/2$-approximation ratio in non-adversarial settings can yield strong online guarantees. This work advances online submodular optimization under bandit feedback and introduces techniques potentially applicable to other approximate maximization problems.

Abstract

We address the online unconstrained submodular maximization problem (Online USM), in a setting with stochastic bandit feedback. In this framework, a decision-maker receives noisy rewards from a non monotone submodular function taking values in a known bounded interval. This paper proposes Double-Greedy - Explore-then-Commit (DG-ETC), adapting the Double-Greedy approach from the offline and online full-information settings. DG-ETC satisfies a $O(d\log(dT))$ problem-dependent upper bound for the $1/2$-approximate pseudo-regret, as well as a $O(dT^{2/3}\log(dT)^{1/3})$ problem-free one at the same time, outperforming existing approaches. In particular, we introduce a problem-dependent notion of hardness characterizing the transition between logarithmic and polynomial regime for the upper bounds.

Logarithmic Regret for Unconstrained Submodular Maximization Stochastic Bandit

TL;DR

-approximate pseudo-regret and a complementary worst-case bound of

. A new problem-dependent hardness measure,

, governs the regret behavior and enables a nuanced mixture of logarithmic and

terms in the bounds. The results hold with high probability and in expectation, showing that exploiting the looseness of the offline

-approximation ratio in non-adversarial settings can yield strong online guarantees. This work advances online submodular optimization under bandit feedback and introduces techniques potentially applicable to other approximate maximization problems.

Abstract

problem-dependent upper bound for the

-approximate pseudo-regret, as well as a

problem-free one at the same time, outperforming existing approaches. In particular, we introduce a problem-dependent notion of hardness characterizing the transition between logarithmic and polynomial regime for the upper bounds.

Logarithmic Regret for Unconstrained Submodular Maximization Stochastic Bandit

TL;DR

Abstract

Logarithmic Regret for Unconstrained Submodular Maximization Stochastic Bandit

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (23)