Sustainable Multi-Agent Crowdsourcing via Physics-Informed Bandits

Chayan Banerjee

Sustainable Multi-Agent Crowdsourcing via Physics-Informed Bandits

Chayan Banerjee

TL;DR

ForGE is introduced, a physics-grounded $K+1$ multi-agent simulator in which each contractor is a rational agent that declares its own load-acceptance threshold based on its fatigue state, converting the standard passive Restless Multi-Armed Bandit into a genuine Stackelberg game.

Abstract

Crowdsourcing platforms face a four-way tension between allocation quality, workforce sustainability, operational feasibility, and strategic contractor behaviour--a dilemma we formalise as the Cold-Start, Burnout, Utilisation, and Strategic Agency Dilemma. Existing methods resolve at most two of these tensions simultaneously: greedy heuristics and multi-criteria decision making (MCDM) methods achieve Day-1 quality but cause catastrophic burnout, while bandit algorithms eliminate burnout only through operationally infeasible 100% workforce utilisation.To address this, we introduce FORGE, a physics-grounded $K+1$ multi-agent simulator in which each contractor is a rational agent that declares its own load-acceptance threshold based on its fatigue state, converting the standard passive Restless Multi-Armed Bandit (RMAB) into a genuine Stackelberg game. Operating within FORGE, we propose a Neural-Linear UCB allocator that fuses a Two-Tower embedding network with a Physics-Informed Covariance Prior derived from offline simulator interactions. The prior simultaneously warm-starts skill-cluster geometry and UCB exploration landscape, providing a geometry-aware belief state from episode 1 that measurably reduces cold-start regret.Over $T = 200$ cold-start episodes, the proposed method achieves the highest reward of all non-oracle methods ($\text{LRew} = 0.555 \pm 0.041$) at only 7.6% workforce utilisation--a combination no conventional baseline achieves--while maintaining robustness to workforce turnover up to 50% and observation noise up to $σ= 0.20$.

Sustainable Multi-Agent Crowdsourcing via Physics-Informed Bandits

TL;DR

ForGE is introduced, a physics-grounded

Abstract

multi-agent simulator in which each contractor is a rational agent that declares its own load-acceptance threshold based on its fatigue state, converting the standard passive Restless Multi-Armed Bandit (RMAB) into a genuine Stackelberg game. Operating within FORGE, we propose a Neural-Linear UCB allocator that fuses a Two-Tower embedding network with a Physics-Informed Covariance Prior derived from offline simulator interactions. The prior simultaneously warm-starts skill-cluster geometry and UCB exploration landscape, providing a geometry-aware belief state from episode 1 that measurably reduces cold-start regret.Over

cold-start episodes, the proposed method achieves the highest reward of all non-oracle methods (

) at only 7.6% workforce utilisation--a combination no conventional baseline achieves--while maintaining robustness to workforce turnover up to 50% and observation noise up to

Paper Structure (38 sections, 16 equations, 1 figure, 4 tables, 1 algorithm)

This paper contains 38 sections, 16 equations, 1 figure, 4 tables, 1 algorithm.

Introduction
Motivation
Scope and High-Level Approach
Related Work
Contextual Bandits and UCB-Based Exploration
Restless Multi-Armed Bandits and Workforce Scheduling
Crowdsourcing Platform Allocation and MCDM
Offline-to-Online Transfer and Warm-Starting
Neural Representation for Matching and Recommendation
Positioning Summary.
Problem Formulation
The Multi-Agent Marketplace and POMDP Tuple
Restless State Dynamics and Economic Feedback
Contractor Agency: The Strategic Availability Decision
Optimization Objective: The Four-Way Trade-off
...and 23 more sections

Figures (1)

Figure 1: FORGE Simulator --- $K{+}1$ Multi-Agent System. The offline phase pre-computes a Physics-Informed Prior from dataset $\mathcal{D}_{\text{sim}}$, injecting initial weights $\boldsymbol{\theta}_0$ and covariance $\mathbf{A}_0^{-1}$ into the Allocator at $t=1$. During live allocation, each contractor independently declares availability $a^{c}_{t,k}$ via a fatigue-threshold policy before the Allocator acts. Observable state variables and the availability signal are concatenated with the task query into a 493-dimensional context vector $\mathbf{x}_{t,k}$, which drives the Allocator's selection $a_t$. The Marketplace Environment processes this decision through three parallel dynamics—success probability, fatigue (RMAB), and surge pricing—feeding reward $r_t$ back to the Allocator and updated states back into the context vector each episode.

Sustainable Multi-Agent Crowdsourcing via Physics-Informed Bandits

TL;DR

Abstract

Sustainable Multi-Agent Crowdsourcing via Physics-Informed Bandits

Authors

TL;DR

Abstract

Table of Contents

Figures (1)