Table of Contents
Fetching ...

Multi-Robot Multi-Queue Control via Exhaustive Assignment Actor-Critic Learning

Mohammad Merati, H. M. Sabbir Ahmad, Wenchao Li, David Castañón

Abstract

We study online task allocation for multi-robot, multi-queue systems with asymmetric stochastic arrivals and switching delays. We formulate the problem in discrete time: each location can host at most one robot per slot, servicing a task consumes one slot, switching between locations incurs a one-slot travel delay, and arrivals at locations are independent Bernoulli processes with heterogeneous rates. Building on our previous structural result that optimal policies are of exhaustive type, we formulate a discounted-cost Markov decision process and develop an exhaustive-assignment actor-critic policy architecture that enforces exhaustive service by construction and learns only the next-queue allocation for idle robots. Unlike the exhaustive-serve-longest (ESL) queue rule, whose optimality is known only under symmetry, the proposed policy adapts to asymmetry in arrival rates. Across different server-location ratios, loads, and asymmetric arrival profiles, the proposed policy consistently achieves lower discounted holding cost and smaller mean queue length than the ESL baseline, while remaining near-optimal on instances where an optimal benchmark is available. These results show that structure-aware actor-critic methods provide an effective approach for real-time multi-robot scheduling.

Multi-Robot Multi-Queue Control via Exhaustive Assignment Actor-Critic Learning

Abstract

We study online task allocation for multi-robot, multi-queue systems with asymmetric stochastic arrivals and switching delays. We formulate the problem in discrete time: each location can host at most one robot per slot, servicing a task consumes one slot, switching between locations incurs a one-slot travel delay, and arrivals at locations are independent Bernoulli processes with heterogeneous rates. Building on our previous structural result that optimal policies are of exhaustive type, we formulate a discounted-cost Markov decision process and develop an exhaustive-assignment actor-critic policy architecture that enforces exhaustive service by construction and learns only the next-queue allocation for idle robots. Unlike the exhaustive-serve-longest (ESL) queue rule, whose optimality is known only under symmetry, the proposed policy adapts to asymmetry in arrival rates. Across different server-location ratios, loads, and asymmetric arrival profiles, the proposed policy consistently achieves lower discounted holding cost and smaller mean queue length than the ESL baseline, while remaining near-optimal on instances where an optimal benchmark is available. These results show that structure-aware actor-critic methods provide an effective approach for real-time multi-robot scheduling.

Paper Structure

This paper contains 18 sections, 21 equations, 4 figures, 3 tables.

Figures (4)

  • Figure C1: Block diagram of the proposed EA-AC architecture. The policy network encodes robot and queue states, models robot-queue interactions, and generates sequential assignment decisions for idle robots. The critic network uses centralized state information to estimate the value function during training.
  • Figure D1: Distribution of arrival rates across the six asymmetric scenarios. Each bar indicates the number of queues whose arrival rate falls at a given value on the $0.05$ grid.
  • Figure D2: Empirical convergence scaling of the proposed EA-AC policy with respect to the number of robots. Over the tested scenarios, the number of training iterations required for convergence grows approximately linearly with system size.
  • Figure F1: Supplementary comparison of mean queue length for EA-AC and ESL across multiple robot--queue configurations. Each panel corresponds to a fixed number of robots, and the three bars within each panel show performance at different queue counts.