Decentralized Reinforcement Learning for Multi-Agent Multi-Resource Allocation via Dynamic Cluster Agreements
Antonio Marino, Esteban Restrepo, Claudio Pacchierotti, Paolo Robuffo Giordano
TL;DR
This work tackles decentralized resource allocation among multiple agents with heterogeneous demands by formulating the problem as a Dec-POMDP and introducing LGTC-IPPO, a decentralized reinforcement learning approach that leverages dynamic cluster consensus to form adaptive sub-teams. The method combines a hybrid global-local reward design, MIQP-guided goal alignment, and a DeepSets/graph-filtered neural architecture inspired by LGTC dynamics, trained via independent PPO with contractivity regularization. Key contributions include a decentralized training framework with a cluster-value function, a reward shaping strategy that balances global objectives with local cooperation, and validation in both simulation and hardware for dynamic reallocation scenarios. Results show improved reward stability and coordination over strong MARL baselines, with dynamic clustering enabling efficient reallocation under changing demands, though centralized methods still excel when complete information is available. The work advances scalable, robust decentralized coordination for complex multi-resource allocation tasks in real-world settings.
Abstract
This paper addresses the challenge of allocating heterogeneous resources among multiple agents in a decentralized manner. Our proposed method, LGTC-IPPO, builds upon Independent Proximal Policy Optimization (IPPO) by integrating dynamic cluster consensus, a mechanism that allows agents to form and adapt local sub-teams based on resource demands. This decentralized coordination strategy reduces reliance on global information and enhances scalability. We evaluate LGTC-IPPO against standard multi-agent reinforcement learning baselines and a centralized expert solution across a range of team sizes and resource distributions. Experimental results demonstrate that LGTC-IPPO achieves more stable rewards, better coordination, and robust performance even as the number of agents or resource types increases. Additionally, we illustrate how dynamic clustering enables agents to reallocate resources efficiently also for scenarios with discharging resources.
