Table of Contents
Fetching ...

Stochastic $k$-Submodular Bandits with Full Bandit Feedback

Guanyu Nie, Vaneet Aggarwal, Christopher John Quinn

TL;DR

The paper tackles online stochastic CMAB with full-bandit feedback where the expected rewards are $k$-submodular, introducing the first sublinear $α$-regret bounds in this setting. It leverages the offline-to-online framework of Nie et al. (2023) by analyzing the robustness of offline $k$-submodular maximization algorithms and transforming them into online CMAB algorithms via a Combinatorial Explore-Then-Commit (C-ETC) approach. The authors establish sublinear $α$-regret for multiple scenarios, including non-monotone and monotone objectives with and without constraints (individual size, matroid, and total size), all with full-bandit feedback. They validate the methods experimentally on online influence maximization with $k=3$ topics, highlighting practical gains from robust offline-to-online adaptations in complex, constrained decision problems.

Abstract

In this paper, we present the first sublinear $α$-regret bounds for online $k$-submodular optimization problems with full-bandit feedback, where $α$ is a corresponding offline approximation ratio. Specifically, we propose online algorithms for multiple $k$-submodular stochastic combinatorial multi-armed bandit problems, including (i) monotone functions and individual size constraints, (ii) monotone functions with matroid constraints, (iii) non-monotone functions with matroid constraints, (iv) non-monotone functions without constraints, and (v) monotone functions without constraints. We transform approximation algorithms for offline $k$-submodular maximization problems into online algorithms through the offline-to-online framework proposed by Nie et al. (2023a). A key contribution of our work is analyzing the robustness of the offline algorithms.

Stochastic $k$-Submodular Bandits with Full Bandit Feedback

TL;DR

The paper tackles online stochastic CMAB with full-bandit feedback where the expected rewards are -submodular, introducing the first sublinear -regret bounds in this setting. It leverages the offline-to-online framework of Nie et al. (2023) by analyzing the robustness of offline -submodular maximization algorithms and transforming them into online CMAB algorithms via a Combinatorial Explore-Then-Commit (C-ETC) approach. The authors establish sublinear -regret for multiple scenarios, including non-monotone and monotone objectives with and without constraints (individual size, matroid, and total size), all with full-bandit feedback. They validate the methods experimentally on online influence maximization with topics, highlighting practical gains from robust offline-to-online adaptations in complex, constrained decision problems.

Abstract

In this paper, we present the first sublinear -regret bounds for online -submodular optimization problems with full-bandit feedback, where is a corresponding offline approximation ratio. Specifically, we propose online algorithms for multiple -submodular stochastic combinatorial multi-armed bandit problems, including (i) monotone functions and individual size constraints, (ii) monotone functions with matroid constraints, (iii) non-monotone functions with matroid constraints, (iv) non-monotone functions without constraints, and (v) monotone functions without constraints. We transform approximation algorithms for offline -submodular maximization problems into online algorithms through the offline-to-online framework proposed by Nie et al. (2023a). A key contribution of our work is analyzing the robustness of the offline algorithms.

Paper Structure

This paper contains 41 sections, 19 theorems, 76 equations, 1 figure, 1 table, 5 algorithms.

Key Result

Theorem 2.1

Ward2014MaximizingKF A function $f : (k + 1)^V \rightarrow \mathbb{R}$ is $k$-submodular if and only if $f$ satisfies the following two conditions: Orthant submodularity:$\Delta_{e, i} f(\boldsymbol{x}) \geq \Delta_{e, i} f(\boldsymbol{y})$ for any $\boldsymbol{x}, \boldsymbol{y} \in (k+1)^V$ with $

Figures (1)

  • Figure 1: Instantaneous Rewards on Influence Maximization experiments.

Theorems & Definitions (33)

  • Remark 1.1
  • Theorem 2.1
  • Definition 2.2: $(\alpha, \delta, N)$-Robust Approximation nie23framework
  • Theorem 2.3
  • Remark 2.4
  • Remark 2.5: Offline v.s. Online Algorithms
  • Proposition 3.1
  • Lemma 3.2
  • Proposition 3.3
  • proof : Part of proof of Proposition \ref{['thm:nmuc:robust']}
  • ...and 23 more