Table of Contents
Fetching ...

Cooperation and Fairness in Multi-Agent Reinforcement Learning

Jasmine Jerry Aloor, Siddharth Nayak, Sydney Dolan, Hamsa Balakrishnan

TL;DR

It is found that by training agents using min-max fair distance goal assignments along with a reward term that incentivizes fairness as they move toward their goals, the agents learn a fair assignment of goals and achieve almost perfect goal coverage in navigation scenarios using only local observations.

Abstract

Multi-agent systems are trained to maximize shared cost objectives, which typically reflect system-level efficiency. However, in the resource-constrained environments of mobility and transportation systems, efficiency may be achieved at the expense of fairness -- certain agents may incur significantly greater costs or lower rewards compared to others. Tasks could be distributed inequitably, leading to some agents receiving an unfair advantage while others incur disproportionately high costs. It is important to consider the tradeoffs between efficiency and fairness. We consider the problem of fair multi-agent navigation for a group of decentralized agents using multi-agent reinforcement learning (MARL). We consider the reciprocal of the coefficient of variation of the distances traveled by different agents as a measure of fairness and investigate whether agents can learn to be fair without significantly sacrificing efficiency (i.e., increasing the total distance traveled). We find that by training agents using min-max fair distance goal assignments along with a reward term that incentivizes fairness as they move towards their goals, the agents (1) learn a fair assignment of goals and (2) achieve almost perfect goal coverage in navigation scenarios using only local observations. For goal coverage scenarios, we find that, on average, our model yields a 14% improvement in efficiency and a 5% improvement in fairness over a baseline trained using random assignments. Furthermore, an average of 21% improvement in fairness can be achieved compared to a model trained on optimally efficient assignments; this increase in fairness comes at the expense of only a 7% decrease in efficiency. Finally, we extend our method to environments in which agents must complete coverage tasks in prescribed formations and show that it is possible to do so without tailoring the models to specific formation shapes.

Cooperation and Fairness in Multi-Agent Reinforcement Learning

TL;DR

It is found that by training agents using min-max fair distance goal assignments along with a reward term that incentivizes fairness as they move toward their goals, the agents learn a fair assignment of goals and achieve almost perfect goal coverage in navigation scenarios using only local observations.

Abstract

Multi-agent systems are trained to maximize shared cost objectives, which typically reflect system-level efficiency. However, in the resource-constrained environments of mobility and transportation systems, efficiency may be achieved at the expense of fairness -- certain agents may incur significantly greater costs or lower rewards compared to others. Tasks could be distributed inequitably, leading to some agents receiving an unfair advantage while others incur disproportionately high costs. It is important to consider the tradeoffs between efficiency and fairness. We consider the problem of fair multi-agent navigation for a group of decentralized agents using multi-agent reinforcement learning (MARL). We consider the reciprocal of the coefficient of variation of the distances traveled by different agents as a measure of fairness and investigate whether agents can learn to be fair without significantly sacrificing efficiency (i.e., increasing the total distance traveled). We find that by training agents using min-max fair distance goal assignments along with a reward term that incentivizes fairness as they move towards their goals, the agents (1) learn a fair assignment of goals and (2) achieve almost perfect goal coverage in navigation scenarios using only local observations. For goal coverage scenarios, we find that, on average, our model yields a 14% improvement in efficiency and a 5% improvement in fairness over a baseline trained using random assignments. Furthermore, an average of 21% improvement in fairness can be achieved compared to a model trained on optimally efficient assignments; this increase in fairness comes at the expense of only a 7% decrease in efficiency. Finally, we extend our method to environments in which agents must complete coverage tasks in prescribed formations and show that it is possible to do so without tailoring the models to specific formation shapes.

Paper Structure

This paper contains 28 sections, 4 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Overview of the training: In the navigation scenario, we track the path of the agents as an episode progresses. Frame A: The episode starts with agents and goals initialized to random positions. For ease of representation, we have ordered them along two lines. The observation vector of agent 1, $o^{(1)}$ is shown in the green box along with its observation radius highlighted in the blue shaded circle. Frames B and C: At every time step, for each agent, the fairness metric $\mathcal{F}_t$ is computed along with each agent's rewards. The agents are assigned goals based on an optimal or fair distance cost. Frame D: Once an agent reaches the assigned goal, it is given a goal reward $\mathcal{R}_\mathrm{goal}$ and is flagged "done" for that episode.
  • Figure 2: Visualization of behaviors of the four navigation models with and without fairly assigned goals and fairness metric rewards. The agents start from the upper half of the environment and navigate to goals located on the bottom left part of the environment. The darker shades indicate newer states in the trajectories traveled by each agent, and the lighter circles indicate earlier states.
  • Figure 3: The violin plots show the distribution of fairness ($\mathcal{F}$) and the total distance traveled by all agents ($D$) over 100 test episodes for four trained models variants discussed in Section \ref{['ssss:models']}: 1) Random goal assignments (RA); 2) Optimal distance cost goal assignments (OA); 3) Fair goal assignments (FA); and 4) Fair goal assignments and a fairness reward (FA+FR). A white circle and tick denote the medians, a plain tick represents the means, and the vertical black lines indicate the 90-10 percentile range. We also show the tradeoffs between fairness and efficiency exhibited by the different models in the rightmost subplot.
  • Figure 4: Congestion in the environment: The figure on the left shows an environment with 3 agents along with 3 obstacles and 2 walls. The figure on the right shows the environment with 7 agents and 3 obstacles. The environment is crowded with the increased number of agents, which decreases free space for navigating in straight lines.
  • Figure 5: Congestion: The violin plots show the distribution of fairness ($\mathcal{F}$), total distance traveled ($D$) and success rates ($S\%$) over 100 test episodes for four trained model variants: 1) Random goal assignments (RA); 2) Optimal distance cost goal assignments (OA); 3) Fair goal assignments (FA); and 4) Fair goal assignments and a fairness reward (FA+FR). A white circle and tick denote the medians, a plain tick represents the means, and the vertical black lines indicate the 90-10 percentile range. We also show the tradeoffs between fairness and efficiency exhibited by the different models in the rightmost subplot.
  • ...and 2 more figures