Multi-agent assignment via state augmented reinforcement learning
Leopoldo Agorio, Sean Van Alen, Miguel Calvo-Fullana, Santiago Paternain, Juan Andres Bazerque
TL;DR
The paper tackles multi-agent assignment under conflicting regional visitation constraints by replacing standard regularization with a state-augmented MDP where Lagrange multipliers become part of the state and oscillate to induce alternating feasible policies. Coordination across agents is achieved via a gossip-based distributed dual-update that shares multiplier information without full state sharing, enabling fully distributed online execution. The approach combines offline policy training conditioned on multipliers with online, networked dual updates, and provides almost-sure feasibility guarantees under reasonable assumptions. Numerical experiments, including intermittent communications and realistic robot navigation (Gazebo), validate that all constraints are satisfied by the team, illustrating practical impact for scalable, constraint-aware multi-agent systems.
Abstract
We address the conflicting requirements of a multi-agent assignment problem through constrained reinforcement learning, emphasizing the inadequacy of standard regularization techniques for this purpose. Instead, we recur to a state augmentation approach in which the oscillation of dual variables is exploited by agents to alternate between tasks. In addition, we coordinate the actions of the multiple agents acting on their local states through these multipliers, which are gossiped through a communication network, eliminating the need to access other agent states. By these means, we propose a distributed multi-agent assignment protocol with theoretical feasibility guarantees that we corroborate in a monitoring numerical experiment.
