Logarithmic Regret for Unconstrained Submodular Maximization Stochastic Bandit
Julien Zhou, Pierre Gaillard, Thibaud Rahier, Julyan Arbel
TL;DR
The paper tackles online unconstrained submodular maximization under stochastic bandit feedback. It introduces DG-ETC, a Double-Greedy based Explore-Then-Commit strategy that adaptively allocates exploration per item to counteract noise, achieving a logarithmic, hardness-dependent bound on the $1/2$-approximate pseudo-regret and a complementary worst-case bound of $O(dT^{2/3}\log(dT)^{1/3})$. A new problem-dependent hardness measure, $H_f$, governs the regret behavior and enables a nuanced mixture of logarithmic and $T^{2/3}$ terms in the bounds. The results hold with high probability and in expectation, showing that exploiting the looseness of the offline $1/2$-approximation ratio in non-adversarial settings can yield strong online guarantees. This work advances online submodular optimization under bandit feedback and introduces techniques potentially applicable to other approximate maximization problems.
Abstract
We address the online unconstrained submodular maximization problem (Online USM), in a setting with stochastic bandit feedback. In this framework, a decision-maker receives noisy rewards from a non monotone submodular function taking values in a known bounded interval. This paper proposes Double-Greedy - Explore-then-Commit (DG-ETC), adapting the Double-Greedy approach from the offline and online full-information settings. DG-ETC satisfies a $O(d\log(dT))$ problem-dependent upper bound for the $1/2$-approximate pseudo-regret, as well as a $O(dT^{2/3}\log(dT)^{1/3})$ problem-free one at the same time, outperforming existing approaches. In particular, we introduce a problem-dependent notion of hardness characterizing the transition between logarithmic and polynomial regime for the upper bounds.
