Table of Contents
Fetching ...

Multi-Agent Combinatorial-Multi-Armed-Bandit framework for the Submodular Welfare Problem under Bandit Feedback

Subham Pokhriyal, Shweta Jain, Vaneet Aggarwal

TL;DR

This work proposes an explore-then-commit strategy with randomized assignments that achieves regret against a $(1-1/e)$ benchmark, the first such guarantee for partition-based submodular welfare problem under bandit feedback.

Abstract

We study the \emph{Submodular Welfare Problem} (SWP), where items are partitioned among agents with monotone submodular utilities to maximize the total welfare under \emph{bandit feedback}. Classical SWP assumes full value-oracle access, achieving $(1-1/e)$ approximations via continuous-greedy algorithms. We extend this to a \emph{multi-agent combinatorial bandit} framework (\textsc{MA-CMAB}), where actions are partitions under full-bandit feedback with non-communicating agents. Unlike prior single-agent or separable multi-agent CMAB models, our setting couples agents through shared allocation constraints. We propose an explore-then-commit strategy with randomized assignments, achieving $\tilde{\mathcal{O}}(T^{2/3})$ regret against a $(1-1/e)$ benchmark, the first such guarantee for partition-based submodular welfare problem under bandit feedback.

Multi-Agent Combinatorial-Multi-Armed-Bandit framework for the Submodular Welfare Problem under Bandit Feedback

TL;DR

This work proposes an explore-then-commit strategy with randomized assignments that achieves regret against a benchmark, the first such guarantee for partition-based submodular welfare problem under bandit feedback.

Abstract

We study the \emph{Submodular Welfare Problem} (SWP), where items are partitioned among agents with monotone submodular utilities to maximize the total welfare under \emph{bandit feedback}. Classical SWP assumes full value-oracle access, achieving approximations via continuous-greedy algorithms. We extend this to a \emph{multi-agent combinatorial bandit} framework (\textsc{MA-CMAB}), where actions are partitions under full-bandit feedback with non-communicating agents. Unlike prior single-agent or separable multi-agent CMAB models, our setting couples agents through shared allocation constraints. We propose an explore-then-commit strategy with randomized assignments, achieving regret against a benchmark, the first such guarantee for partition-based submodular welfare problem under bandit feedback.
Paper Structure (34 sections, 22 theorems, 104 equations, 1 table, 2 algorithms)

This paper contains 34 sections, 22 theorems, 104 equations, 1 table, 2 algorithms.

Key Result

Theorem 4.2

Under inexact utility evaluations $|\hat{w}(S) - w(S)| \leq \epsilon$, for $\epsilon \le \frac{1}{(MN)^2}$, continuous greedy is an $(\alpha, \delta, \eta)$-resilient approximation algorithm for the Submodular Welfare (discrete partition) problem, where:

Theorems & Definitions (48)

  • Definition 4.1: $(\alpha, \delta, \eta)$- Resilient Approximation
  • Remark 1
  • Remark 2
  • Remark 3
  • Theorem 4.2: Continuous Greedy Resilience
  • Lemma 4.3
  • proof : Proof sketch
  • Lemma 4.4: Noisy resilient bound for Continuous Greedy
  • proof : Proof sketch
  • Remark 4
  • ...and 38 more