Table of Contents
Fetching ...

Relational Weight Optimization for Enhancing Team Performance in Multi-Agent Multi-Armed Bandits

Monish Reddy Kotturu, Saniya Vahedian Movahed, Paul Robinette, Kshitij Jerath, Amanda Redlich, Reza Azadeh

Abstract

We introduce an approach to improve team performance in a Multi-Agent Multi-Armed Bandit (MAMAB) framework using Fastest Mixing Markov Chain (FMMC) and Fastest Distributed Linear Averaging (FDLA) optimization algorithms. The multi-agent team is represented using a fixed relational network and simulated using the Coop-UCB2 algorithm. The edge weights of the communication network directly impact the time taken to reach distributed consensus. Our goal is to shrink the timescale on which the convergence of the consensus occurs to achieve optimal team performance and maximize reward. Through our experiments, we show that the convergence to team consensus occurs slightly faster in large constrained networks.

Relational Weight Optimization for Enhancing Team Performance in Multi-Agent Multi-Armed Bandits

Abstract

We introduce an approach to improve team performance in a Multi-Agent Multi-Armed Bandit (MAMAB) framework using Fastest Mixing Markov Chain (FMMC) and Fastest Distributed Linear Averaging (FDLA) optimization algorithms. The multi-agent team is represented using a fixed relational network and simulated using the Coop-UCB2 algorithm. The edge weights of the communication network directly impact the time taken to reach distributed consensus. Our goal is to shrink the timescale on which the convergence of the consensus occurs to achieve optimal team performance and maximize reward. Through our experiments, we show that the convergence to team consensus occurs slightly faster in large constrained networks.

Paper Structure

This paper contains 17 sections, 12 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Graphical representation of the networks
  • Figure 2: Comparison of team average of the errors between the estimated and true means of the best arm in different networks. The vertical dashed lines represent the time taken by the network to reach 5% of the final value of the largest error among all algorithms.