Table of Contents
Fetching ...

Multi-agent Reinforcement Learning with Deep Networks for Diverse Q-Vectors

Zhenglong Luo, Zhiyong Chen, James Welsh

TL;DR

In MARL, agents' rewards yield a vector of Q-values rather than a single scalar, complicating policy optimization. The authors define three optimal Q-vector notions—Max, Nash, and Maximin—and develop a DQN-based framework that learns $Q_\phi(s,a)$ and derives actions via corresponding operators ${\mathcal{P}}_{\max}$, ${\mathcal{P}}_{\operatorname{nash}}$, and ${\mathcal{P}}_{\operatorname{mm}}$, with a dueling DQN variant. They formalize the Q-vector objectives $Q^*_{\max}$, $Q^*_{\operatorname{nash}}$, and $Q^*_{\operatorname{mm}}$, and validate learning on a physically grounded two-arm lifting task in Robosuite/MUJOCO, showing that the method recovers the expected joint policies under balanced and unbalanced action costs. The work demonstrates the feasibility of incorporating game-theoretic optimality into deep MARL for coordinated robotics and lays groundwork toward scalability to larger agent ensembles and more strategies.

Abstract

Multi-agent reinforcement learning (MARL) has become a significant research topic due to its ability to facilitate learning in complex environments. In multi-agent tasks, the state-action value, commonly referred to as the Q-value, can vary among agents because of their individual rewards, resulting in a Q-vector. Determining an optimal policy is challenging, as it involves more than just maximizing a single Q-value. Various optimal policies, such as a Nash equilibrium, have been studied in this context. Algorithms like Nash Q-learning and Nash Actor-Critic have shown effectiveness in these scenarios. This paper extends this research by proposing a deep Q-networks (DQN) algorithm capable of learning various Q-vectors using Max, Nash, and Maximin strategies. The effectiveness of this approach is demonstrated in an environment where dual robotic arms collaborate to lift a pot.

Multi-agent Reinforcement Learning with Deep Networks for Diverse Q-Vectors

TL;DR

In MARL, agents' rewards yield a vector of Q-values rather than a single scalar, complicating policy optimization. The authors define three optimal Q-vector notions—Max, Nash, and Maximin—and develop a DQN-based framework that learns and derives actions via corresponding operators , , and , with a dueling DQN variant. They formalize the Q-vector objectives , , and , and validate learning on a physically grounded two-arm lifting task in Robosuite/MUJOCO, showing that the method recovers the expected joint policies under balanced and unbalanced action costs. The work demonstrates the feasibility of incorporating game-theoretic optimality into deep MARL for coordinated robotics and lays groundwork toward scalability to larger agent ensembles and more strategies.

Abstract

Multi-agent reinforcement learning (MARL) has become a significant research topic due to its ability to facilitate learning in complex environments. In multi-agent tasks, the state-action value, commonly referred to as the Q-value, can vary among agents because of their individual rewards, resulting in a Q-vector. Determining an optimal policy is challenging, as it involves more than just maximizing a single Q-value. Various optimal policies, such as a Nash equilibrium, have been studied in this context. Algorithms like Nash Q-learning and Nash Actor-Critic have shown effectiveness in these scenarios. This paper extends this research by proposing a deep Q-networks (DQN) algorithm capable of learning various Q-vectors using Max, Nash, and Maximin strategies. The effectiveness of this approach is demonstrated in an environment where dual robotic arms collaborate to lift a pot.
Paper Structure (9 sections, 4 equations, 10 figures, 3 tables)

This paper contains 9 sections, 4 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Schematic diagram of UR5e joints.
  • Figure 2: Initial position of the robot arms lifting task.
  • Figure 4: Profile of Max Q-vectors in Case 1: no arm lifted.
  • Figure 5: Profile of Nash Q-vectors in Case 1: right arm lifted.
  • Figure 6: Profile of Nash Q-vectors in Case 1: right arm lifted and then left arm lifted.
  • ...and 5 more figures