Table of Contents
Fetching ...

Decentralized Reinforcement Learning for Multi-Agent Multi-Resource Allocation via Dynamic Cluster Agreements

Antonio Marino, Esteban Restrepo, Claudio Pacchierotti, Paolo Robuffo Giordano

TL;DR

This work tackles decentralized resource allocation among multiple agents with heterogeneous demands by formulating the problem as a Dec-POMDP and introducing LGTC-IPPO, a decentralized reinforcement learning approach that leverages dynamic cluster consensus to form adaptive sub-teams. The method combines a hybrid global-local reward design, MIQP-guided goal alignment, and a DeepSets/graph-filtered neural architecture inspired by LGTC dynamics, trained via independent PPO with contractivity regularization. Key contributions include a decentralized training framework with a cluster-value function, a reward shaping strategy that balances global objectives with local cooperation, and validation in both simulation and hardware for dynamic reallocation scenarios. Results show improved reward stability and coordination over strong MARL baselines, with dynamic clustering enabling efficient reallocation under changing demands, though centralized methods still excel when complete information is available. The work advances scalable, robust decentralized coordination for complex multi-resource allocation tasks in real-world settings.

Abstract

This paper addresses the challenge of allocating heterogeneous resources among multiple agents in a decentralized manner. Our proposed method, LGTC-IPPO, builds upon Independent Proximal Policy Optimization (IPPO) by integrating dynamic cluster consensus, a mechanism that allows agents to form and adapt local sub-teams based on resource demands. This decentralized coordination strategy reduces reliance on global information and enhances scalability. We evaluate LGTC-IPPO against standard multi-agent reinforcement learning baselines and a centralized expert solution across a range of team sizes and resource distributions. Experimental results demonstrate that LGTC-IPPO achieves more stable rewards, better coordination, and robust performance even as the number of agents or resource types increases. Additionally, we illustrate how dynamic clustering enables agents to reallocate resources efficiently also for scenarios with discharging resources.

Decentralized Reinforcement Learning for Multi-Agent Multi-Resource Allocation via Dynamic Cluster Agreements

TL;DR

This work tackles decentralized resource allocation among multiple agents with heterogeneous demands by formulating the problem as a Dec-POMDP and introducing LGTC-IPPO, a decentralized reinforcement learning approach that leverages dynamic cluster consensus to form adaptive sub-teams. The method combines a hybrid global-local reward design, MIQP-guided goal alignment, and a DeepSets/graph-filtered neural architecture inspired by LGTC dynamics, trained via independent PPO with contractivity regularization. Key contributions include a decentralized training framework with a cluster-value function, a reward shaping strategy that balances global objectives with local cooperation, and validation in both simulation and hardware for dynamic reallocation scenarios. Results show improved reward stability and coordination over strong MARL baselines, with dynamic clustering enabling efficient reallocation under changing demands, though centralized methods still excel when complete information is available. The work advances scalable, robust decentralized coordination for complex multi-resource allocation tasks in real-world settings.

Abstract

This paper addresses the challenge of allocating heterogeneous resources among multiple agents in a decentralized manner. Our proposed method, LGTC-IPPO, builds upon Independent Proximal Policy Optimization (IPPO) by integrating dynamic cluster consensus, a mechanism that allows agents to form and adapt local sub-teams based on resource demands. This decentralized coordination strategy reduces reliance on global information and enhances scalability. We evaluate LGTC-IPPO against standard multi-agent reinforcement learning baselines and a centralized expert solution across a range of team sizes and resource distributions. Experimental results demonstrate that LGTC-IPPO achieves more stable rewards, better coordination, and robust performance even as the number of agents or resource types increases. Additionally, we illustrate how dynamic clustering enables agents to reallocate resources efficiently also for scenarios with discharging resources.

Paper Structure

This paper contains 9 sections, 2 theorems, 24 equations, 7 figures, 1 table.

Key Result

Theorem 1

Under Assumption assumption1 and assumption2, with $x(0) \in \mathcal{X}$, system eq:model-dynamics is infinitesimally contractive and the state is bounded in the range $[-1,1]$, if the following constraints are satisfied

Figures (7)

  • Figure 1: An assignment example involving a group of heterogeneous robots transporting diverse resources. The robots are allocated to optimally fulfill consumer demands, which are represented by the red dots.
  • Figure 2: Neural network model architecture for the value and policy estimation
  • Figure 3: The agents' rewards cluster as the agents (green dots) spatially cluster to satisfy the multi-resource demand (red dots)
  • Figure 4: Mean and standard deviation of accumulated rewards over four training runs with random seeds, reported for our approach and four state-of-the-art MARL methods.
  • Figure 5: Mean and standard deviation of accumulated rewards over a variable number of agents and consumers, reported for a centralized expert and our approach (LGTC-IPPO).
  • ...and 2 more figures

Theorems & Definitions (6)

  • Theorem 1
  • proof
  • Remark 1
  • Theorem 2
  • proof
  • Remark 2