Multi-Agent Combinatorial-Multi-Armed-Bandit framework for the Submodular Welfare Problem under Bandit Feedback

Subham Pokhriyal; Shweta Jain; Vaneet Aggarwal

Multi-Agent Combinatorial-Multi-Armed-Bandit framework for the Submodular Welfare Problem under Bandit Feedback

Subham Pokhriyal, Shweta Jain, Vaneet Aggarwal

TL;DR

This work proposes an explore-then-commit strategy with randomized assignments that achieves regret against a $(1-1/e)$ benchmark, the first such guarantee for partition-based submodular welfare problem under bandit feedback.

Abstract

We study the \emph{Submodular Welfare Problem} (SWP), where items are partitioned among agents with monotone submodular utilities to maximize the total welfare under \emph{bandit feedback}. Classical SWP assumes full value-oracle access, achieving $(1-1/e)$ approximations via continuous-greedy algorithms. We extend this to a \emph{multi-agent combinatorial bandit} framework (\textsc{MA-CMAB}), where actions are partitions under full-bandit feedback with non-communicating agents. Unlike prior single-agent or separable multi-agent CMAB models, our setting couples agents through shared allocation constraints. We propose an explore-then-commit strategy with randomized assignments, achieving $\tilde{\mathcal{O}}(T^{2/3})$ regret against a $(1-1/e)$ benchmark, the first such guarantee for partition-based submodular welfare problem under bandit feedback.

Multi-Agent Combinatorial-Multi-Armed-Bandit framework for the Submodular Welfare Problem under Bandit Feedback

TL;DR

This work proposes an explore-then-commit strategy with randomized assignments that achieves regret against a

benchmark, the first such guarantee for partition-based submodular welfare problem under bandit feedback.

Abstract

approximations via continuous-greedy algorithms. We extend this to a \emph{multi-agent combinatorial bandit} framework (\textsc{MA-CMAB}), where actions are partitions under full-bandit feedback with non-communicating agents. Unlike prior single-agent or separable multi-agent CMAB models, our setting couples agents through shared allocation constraints. We propose an explore-then-commit strategy with randomized assignments, achieving

regret against a

benchmark, the first such guarantee for partition-based submodular welfare problem under bandit feedback.

Paper Structure (34 sections, 22 theorems, 104 equations, 1 table, 2 algorithms)

This paper contains 34 sections, 22 theorems, 104 equations, 1 table, 2 algorithms.

Introduction
Related Work
Submodular Welfare Problem
Multi-agent Multi-armed Bandits
Single/Multi-agent CMAB Framework
Problem Statement
Model.
Offline Benchmark.
Online Learning Protocol.
Regret for Submodular Welfare Problem.
Resilience Guarantee for the Submodular Welfare Problem
Continuous Greedy for Submodular Welfare.
Oracle complexity.
Offline Resilient Approximation
Resilience Guarantees for Continuous Greedy Algorithm under Noisy Oracle Access
...and 19 more sections

Key Result

Theorem 4.2

Under inexact utility evaluations $|\hat{w}(S) - w(S)| \leq \epsilon$, for $\epsilon \le \frac{1}{(MN)^2}$, continuous greedy is an $(\alpha, \delta, \eta)$-resilient approximation algorithm for the Submodular Welfare (discrete partition) problem, where:

Theorems & Definitions (48)

Definition 4.1: $(\alpha, \delta, \eta)$- Resilient Approximation
Remark 1
Remark 2
Remark 3
Theorem 4.2: Continuous Greedy Resilience
Lemma 4.3
proof : Proof sketch
Lemma 4.4: Noisy resilient bound for Continuous Greedy
proof : Proof sketch
Remark 4
...and 38 more

Multi-Agent Combinatorial-Multi-Armed-Bandit framework for the Submodular Welfare Problem under Bandit Feedback

TL;DR

Abstract

Multi-Agent Combinatorial-Multi-Armed-Bandit framework for the Submodular Welfare Problem under Bandit Feedback

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (48)