Table of Contents
Fetching ...

Stochastic Submodular Bandits with Delayed Composite Anonymous Bandit Feedback

Mohammad Pedramfar, Vaneet Aggarwal

TL;DR

The study addresses stochastic CMAB with submodular, monotone rewards under delayed composite anonymous feedback, introducing three delay models and an ETCG-based strategy. By leveraging upper-tail delay bounds and Bernstein-type concentration with careful event decomposition, it derives regret bounds that exhibit an additive dependence on delay across models, culminating in a unified tilde-regret of approximately O(T^{2/3} + T^{1/3} ν). The results demonstrate both theoretical and empirical robustness of ETCG to delayed feedback and generalize to broader CMAB classes via robustness-based reductions. The work advances the theoretical understanding of learning with delayed, aggregated feedback in combinatorial settings and offers practical guidance for applications in influence maximization, recommendations, and related domains where delayed signals are intrinsic.

Abstract

This paper investigates the problem of combinatorial multiarmed bandits with stochastic submodular (in expectation) rewards and full-bandit delayed feedback, where the delayed feedback is assumed to be composite and anonymous. In other words, the delayed feedback is composed of components of rewards from past actions, with unknown division among the sub-components. Three models of delayed feedback: bounded adversarial, stochastic independent, and stochastic conditionally independent are studied, and regret bounds are derived for each of the delay models. Ignoring the problem dependent parameters, we show that regret bound for all the delay models is $\tilde{O}(T^{2/3} + T^{1/3} ν)$ for time horizon $T$, where $ν$ is a delay parameter defined differently in the three cases, thus demonstrating an additive term in regret with delay in all the three delay models. The considered algorithm is demonstrated to outperform other full-bandit approaches with delayed composite anonymous feedback.

Stochastic Submodular Bandits with Delayed Composite Anonymous Bandit Feedback

TL;DR

The study addresses stochastic CMAB with submodular, monotone rewards under delayed composite anonymous feedback, introducing three delay models and an ETCG-based strategy. By leveraging upper-tail delay bounds and Bernstein-type concentration with careful event decomposition, it derives regret bounds that exhibit an additive dependence on delay across models, culminating in a unified tilde-regret of approximately O(T^{2/3} + T^{1/3} ν). The results demonstrate both theoretical and empirical robustness of ETCG to delayed feedback and generalize to broader CMAB classes via robustness-based reductions. The work advances the theoretical understanding of learning with delayed, aggregated feedback in combinatorial settings and offers practical guidance for applications in influence maximization, recommendations, and related domains where delayed signals are intrinsic.

Abstract

This paper investigates the problem of combinatorial multiarmed bandits with stochastic submodular (in expectation) rewards and full-bandit delayed feedback, where the delayed feedback is assumed to be composite and anonymous. In other words, the delayed feedback is composed of components of rewards from past actions, with unknown division among the sub-components. Three models of delayed feedback: bounded adversarial, stochastic independent, and stochastic conditionally independent are studied, and regret bounds are derived for each of the delay models. Ignoring the problem dependent parameters, we show that regret bound for all the delay models is for time horizon , where is a delay parameter defined differently in the three cases, thus demonstrating an additive term in regret with delay in all the three delay models. The considered algorithm is demonstrated to outperform other full-bandit approaches with delayed composite anonymous feedback.
Paper Structure (20 sections, 23 theorems, 104 equations, 2 figures, 2 algorithms)

This paper contains 20 sections, 23 theorems, 104 equations, 2 figures, 2 algorithms.

Key Result

Lemma 1

Let $(\delta_i)_{i \in I}$ be a family of probability distributions over the set of non-negative integers. Then this family is tight, if and only if it has an upper tail bound.

Figures (2)

  • Figure 1: This plot shows the average cumulative 1-regret over horizon for each setting in the log-log scale. The dashed lines are $y = a T^{2/3}$ for $a \in \{ 0.1, 1, 10 \}$. Note that (F1) is a linear function and (D1) is the setting with no delay. Moreover, (D2) corresponds to a delay setting where delay distributions are concentrated near zero and decay exponentially.
  • Figure 2: This plot shows the average added cumulative regret over horizon for each setting in the symlog-log scale over 10 runs. The scale of the y-axis is linear for $|y| \leq 100$ and logarithmic for $|y| > 100$. The gray dashed lines are $y = a T^{1/3}$ for $a \in \{ 10, 100, 300 \}$. The cyan dashed lines are $y = \nu T^{1/3}$ where $\nu$ is the corresponding delay coefficient appearing in the regret bounds in Theorems \ref{['T:regret_uniformly_bounded_delay']}, \ref{['T:regret_stochastic_independent']}, and \ref{['T:regret_stochastic_conditionally_independent']}.

Theorems & Definitions (44)

  • Example 1
  • Remark 1
  • Definition 1
  • Lemma 1
  • Remark 2
  • Theorem 1
  • Theorem 2: Bounded Adversarial Delay
  • proof
  • Theorem 3: Stochastic Independent Delay
  • proof
  • ...and 34 more