Multi-agent Reinforcement Learning with Deep Networks for Diverse Q-Vectors

Zhenglong Luo; Zhiyong Chen; James Welsh

Multi-agent Reinforcement Learning with Deep Networks for Diverse Q-Vectors

Zhenglong Luo, Zhiyong Chen, James Welsh

TL;DR

In MARL, agents' rewards yield a vector of Q-values rather than a single scalar, complicating policy optimization. The authors define three optimal Q-vector notions—Max, Nash, and Maximin—and develop a DQN-based framework that learns $Q_\phi(s,a)$ and derives actions via corresponding operators ${\mathcal{P}}_{\max}$, ${\mathcal{P}}_{\operatorname{nash}}$, and ${\mathcal{P}}_{\operatorname{mm}}$, with a dueling DQN variant. They formalize the Q-vector objectives $Q^*_{\max}$, $Q^*_{\operatorname{nash}}$, and $Q^*_{\operatorname{mm}}$, and validate learning on a physically grounded two-arm lifting task in Robosuite/MUJOCO, showing that the method recovers the expected joint policies under balanced and unbalanced action costs. The work demonstrates the feasibility of incorporating game-theoretic optimality into deep MARL for coordinated robotics and lays groundwork toward scalability to larger agent ensembles and more strategies.

Abstract

Multi-agent reinforcement learning (MARL) has become a significant research topic due to its ability to facilitate learning in complex environments. In multi-agent tasks, the state-action value, commonly referred to as the Q-value, can vary among agents because of their individual rewards, resulting in a Q-vector. Determining an optimal policy is challenging, as it involves more than just maximizing a single Q-value. Various optimal policies, such as a Nash equilibrium, have been studied in this context. Algorithms like Nash Q-learning and Nash Actor-Critic have shown effectiveness in these scenarios. This paper extends this research by proposing a deep Q-networks (DQN) algorithm capable of learning various Q-vectors using Max, Nash, and Maximin strategies. The effectiveness of this approach is demonstrated in an environment where dual robotic arms collaborate to lift a pot.

Multi-agent Reinforcement Learning with Deep Networks for Diverse Q-Vectors

TL;DR

and derives actions via corresponding operators

, and

, with a dueling DQN variant. They formalize the Q-vector objectives

, and

, and validate learning on a physically grounded two-arm lifting task in Robosuite/MUJOCO, showing that the method recovers the expected joint policies under balanced and unbalanced action costs. The work demonstrates the feasibility of incorporating game-theoretic optimality into deep MARL for coordinated robotics and lays groundwork toward scalability to larger agent ensembles and more strategies.

Abstract

Paper Structure (9 sections, 4 equations, 10 figures, 3 tables)

This paper contains 9 sections, 4 equations, 10 figures, 3 tables.

Introduction
Optimal Q-Vectors
Algorithms
Experiments
Experimental Environment
Results and Evaluation
Case 1: Balanced Action Costs
Case 2: Unbalanced Action Costs
Conclusion

Figures (10)

Figure 1: Schematic diagram of UR5e joints.
Figure 2: Initial position of the robot arms lifting task.
Figure 4: Profile of Max Q-vectors in Case 1: no arm lifted.
Figure 5: Profile of Nash Q-vectors in Case 1: right arm lifted.
Figure 6: Profile of Nash Q-vectors in Case 1: right arm lifted and then left arm lifted.
...and 5 more figures

Multi-agent Reinforcement Learning with Deep Networks for Diverse Q-Vectors

TL;DR

Abstract

Multi-agent Reinforcement Learning with Deep Networks for Diverse Q-Vectors

Authors

TL;DR

Abstract

Table of Contents

Figures (10)