Learning Emergence of Interaction Patterns across Independent RL Agents in Multi-Agent Environments

Vasanth Reddy Baddam; Suat Gumussoy; Almuatazbellah Boker; Hoda Eldardiry

Learning Emergence of Interaction Patterns across Independent RL Agents in Multi-Agent Environments

Vasanth Reddy Baddam, Suat Gumussoy, Almuatazbellah Boker, Hoda Eldardiry

TL;DR

The Bottom Up Network treats the collective of multi-agents as a unified entity while employing a specialized weight initialization strategy that promotes independent learning and dynamically establish connections among agents using gradient information, enabling coordination when necessary while maintaining these connections as limited and sparse to effectively manage the computational budget.

Abstract

Many real-world problems, such as controlling swarms of drones and urban traffic, naturally lend themselves to modeling as multi-agent reinforcement learning (RL) problems. However, existing multi-agent RL methods often suffer from scalability challenges, primarily due to the introduction of communication among agents. Consequently, a key challenge lies in adapting the success of deep learning in single-agent RL to the multi-agent setting. In response to this challenge, we propose an approach that fundamentally reimagines multi-agent environments. Unlike conventional methods that model each agent individually with separate networks, our approach, the Bottom Up Network (BUN), adopts a unique perspective. BUN treats the collective of multi-agents as a unified entity while employing a specialized weight initialization strategy that promotes independent learning. Furthermore, we dynamically establish connections among agents using gradient information, enabling coordination when necessary while maintaining these connections as limited and sparse to effectively manage the computational budget. Our extensive empirical evaluations across a variety of cooperative multi-agent scenarios, including tasks such as cooperative navigation and traffic control, consistently demonstrate BUN's superiority over baseline methods with substantially reduced computational costs.

Learning Emergence of Interaction Patterns across Independent RL Agents in Multi-Agent Environments

TL;DR

Abstract

Paper Structure (20 sections, 6 equations, 7 figures, 6 tables, 1 algorithm)

This paper contains 20 sections, 6 equations, 7 figures, 6 tables, 1 algorithm.

Introduction
Related work
Background
Reinforcement Learning
Single Agent
Multi-Agent Reinforcement Learning
DQN
BUN : Bottom Up Network
Experiments
Benchmark Algorithms
Environments
Implementation Details
Results
Robustness
Conclusion
...and 5 more sections

Figures (7)

Figure 1: The BUN approach involves a two-step. 1. Weight Initialization: Weights are initialized so that $i^{th}$ agent’s observation $o_i$ is directly mapped to its action $a_i$ without any dependence on the other agent’s observation. 2. Weight Emergence: We then grow the weights across the agents according to the highest magnitude gradient signal. The Green dotted line represents the newly emerged weights/connections.
Figure 2: Learning curve during the training of Cooperative Navigation environments. Agents on SS and SS + CC are trained for 20000 time-steps while on SS+C are trained for 500000 time-steps. The plots show the mean episode reward over 10 random seeds.
Figure 3: Comparison between BUN (left) and RigL (right) on the Simple Spread with Communication (SS+C) and Simple Spread with Cross Communication (SS+C) environments at t = 0, 6, and 10 and t = 0, 10, and 25. Small circles indicate landmarks and Big circles indicate Agents. In SS+C, the white agent is assigned a white landmark, while the black agent is assigned a black landmark. In SS+CC, the white agent is penalized twice as a black agent to reach the black landmark, while red and black agents are assigned to the red landmark. The black agent is aggressive to reach the red landmark as it is penalized twice as the red agent to reach the red landmark. In both environments, the agents trained using BUN tried to grasp the information of their target landmarks from their fellow agents and reach the target landmarks. On the other hand, the agents trained using RigL struggle to establish the connection between agents. In SS+C, the black agent reaches the black landmark, establishing that it only learned the local behaviour but did not establish the connection between the white agent. See the video for complete trajectories provided in the supplementary material.
Figure 4: In this comparison, we examine the training approaches of BUN and RigL within the context of the SS+CC environment. These figures showcase the evolution of neural network weights in both methods. In the BUN approach, training starts with local weight initialization (a), where agents operate independently. Agent observations follow a specific sequence, with black and white agents preceding red. The aim is to establish connections between agents (highlighted in red boxes) with no emergence of weights across agent red and agent white (green boxes). The weights in BUN emerge within a fixed budget (b = 30), as depicted in (b). Conversely, RigL exhibits a different pattern of weight emergence, as seen in (c). Unlike BUN, RigL introduces random weight connections. These structural weight emergence patterns shed light on the results presented in the accompanying table and the agent trajectories in Figure \ref{['SSTable']}, highlighting each approach's distinct communication and coordination strategies.
Figure 5: Learning curve during the training of Grid 2$\times$ environment. The plots show the Episode waiting time and Average Waiting Time of Vehicle in the grid network.
...and 2 more figures

Learning Emergence of Interaction Patterns across Independent RL Agents in Multi-Agent Environments

TL;DR

Abstract

Learning Emergence of Interaction Patterns across Independent RL Agents in Multi-Agent Environments

Authors

TL;DR

Abstract

Table of Contents

Figures (7)