Stochastic $k$-Submodular Bandits with Full Bandit Feedback

Guanyu Nie; Vaneet Aggarwal; Christopher John Quinn

Stochastic $k$-Submodular Bandits with Full Bandit Feedback

Guanyu Nie, Vaneet Aggarwal, Christopher John Quinn

TL;DR

The paper tackles online stochastic CMAB with full-bandit feedback where the expected rewards are $k$-submodular, introducing the first sublinear $α$-regret bounds in this setting. It leverages the offline-to-online framework of Nie et al. (2023) by analyzing the robustness of offline $k$-submodular maximization algorithms and transforming them into online CMAB algorithms via a Combinatorial Explore-Then-Commit (C-ETC) approach. The authors establish sublinear $α$-regret for multiple scenarios, including non-monotone and monotone objectives with and without constraints (individual size, matroid, and total size), all with full-bandit feedback. They validate the methods experimentally on online influence maximization with $k=3$ topics, highlighting practical gains from robust offline-to-online adaptations in complex, constrained decision problems.

Abstract

In this paper, we present the first sublinear $α$-regret bounds for online $k$-submodular optimization problems with full-bandit feedback, where $α$ is a corresponding offline approximation ratio. Specifically, we propose online algorithms for multiple $k$-submodular stochastic combinatorial multi-armed bandit problems, including (i) monotone functions and individual size constraints, (ii) monotone functions with matroid constraints, (iii) non-monotone functions with matroid constraints, (iv) non-monotone functions without constraints, and (v) monotone functions without constraints. We transform approximation algorithms for offline $k$-submodular maximization problems into online algorithms through the offline-to-online framework proposed by Nie et al. (2023a). A key contribution of our work is analyzing the robustness of the offline algorithms.

Stochastic $k$-Submodular Bandits with Full Bandit Feedback

TL;DR

The paper tackles online stochastic CMAB with full-bandit feedback where the expected rewards are

-submodular, introducing the first sublinear

-regret bounds in this setting. It leverages the offline-to-online framework of Nie et al. (2023) by analyzing the robustness of offline

-submodular maximization algorithms and transforming them into online CMAB algorithms via a Combinatorial Explore-Then-Commit (C-ETC) approach. The authors establish sublinear

-regret for multiple scenarios, including non-monotone and monotone objectives with and without constraints (individual size, matroid, and total size), all with full-bandit feedback. They validate the methods experimentally on online influence maximization with

topics, highlighting practical gains from robust offline-to-online adaptations in complex, constrained decision problems.

Abstract

In this paper, we present the first sublinear

-regret bounds for online

-submodular optimization problems with full-bandit feedback, where

is a corresponding offline approximation ratio. Specifically, we propose online algorithms for multiple

-submodular stochastic combinatorial multi-armed bandit problems, including (i) monotone functions and individual size constraints, (ii) monotone functions with matroid constraints, (iii) non-monotone functions with matroid constraints, (iv) non-monotone functions without constraints, and (v) monotone functions without constraints. We transform approximation algorithms for offline

-submodular maximization problems into online algorithms through the offline-to-online framework proposed by Nie et al. (2023a). A key contribution of our work is analyzing the robustness of the offline algorithms.

Stochastic $k$-Submodular Bandits with Full Bandit Feedback

TL;DR

Abstract

Stochastic $k$-Submodular Bandits with Full Bandit Feedback

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (33)