Table of Contents
Fetching ...

PairVDN - Pair-wise Decomposed Value Functions

Zak Buzzard

TL;DR

The paper tackles cooperative multi-agent reinforcement learning by addressing the expressivity limits of monotonic value decompositions like VDN and QMIX. It introduces PairVDN, a non-monotonic pairwise decomposition of the joint Q-function $Q((o_1,...,o_n),(a_1,...,a_n))$, expressed as $\sum_{i=1}^n \tilde{Q}_{i,i+1}((o_i,o_{i+1}),(a_i,a_{i+1}))$, with a dynamic programming solver achieving $O(n|\mathcal{A}|^3)$ maximisation. The method uses pairwise networks $\tilde{Q}_{i,i+1}$ that take two observations and output $|\mathcal{A}|^2$ values, allowing richer inter-agent interactions while maintaining tractable optimization; experiments on Box Jump with many agents show improved coordination over baselines, especially at longer horizons, though more complex environments like Cooking Zoo remain challenging for the simple DQN baseline. Overall, PairVDN broadens the expressiveness of multi-agent value decompositions and demonstrates tangible gains in coordination in large-scale cooperative MARL, with open-source code for reproduction. The approach highlights the limitations of monotonic decompositions and suggests directions for extending the framework to more flexible interaction graphs.

Abstract

Extending deep Q-learning to cooperative multi-agent settings is challenging due to the exponential growth of the joint action space, the non-stationary environment, and the credit assignment problem. Value decomposition allows deep Q-learning to be applied at the joint agent level, at the cost of reduced expressivity. Building on past work in this direction, our paper proposes PairVDN, a novel method for decomposing the value function into a collection of pair-wise, rather than per-agent, functions, improving expressivity at the cost of requiring a more complex (but still efficient) dynamic programming maximisation algorithm. Our method enables the representation of value functions which cannot be expressed as a monotonic combination of per-agent functions, unlike past approaches such as VDN and QMIX. We implement a novel many-agent cooperative environment, Box Jump, and demonstrate improved performance over these baselines in this setting. We open-source our code and environment at https://github.com/zzbuzzard/PairVDN.

PairVDN - Pair-wise Decomposed Value Functions

TL;DR

The paper tackles cooperative multi-agent reinforcement learning by addressing the expressivity limits of monotonic value decompositions like VDN and QMIX. It introduces PairVDN, a non-monotonic pairwise decomposition of the joint Q-function , expressed as , with a dynamic programming solver achieving maximisation. The method uses pairwise networks that take two observations and output values, allowing richer inter-agent interactions while maintaining tractable optimization; experiments on Box Jump with many agents show improved coordination over baselines, especially at longer horizons, though more complex environments like Cooking Zoo remain challenging for the simple DQN baseline. Overall, PairVDN broadens the expressiveness of multi-agent value decompositions and demonstrates tangible gains in coordination in large-scale cooperative MARL, with open-source code for reproduction. The approach highlights the limitations of monotonic decompositions and suggests directions for extending the framework to more flexible interaction graphs.

Abstract

Extending deep Q-learning to cooperative multi-agent settings is challenging due to the exponential growth of the joint action space, the non-stationary environment, and the credit assignment problem. Value decomposition allows deep Q-learning to be applied at the joint agent level, at the cost of reduced expressivity. Building on past work in this direction, our paper proposes PairVDN, a novel method for decomposing the value function into a collection of pair-wise, rather than per-agent, functions, improving expressivity at the cost of requiring a more complex (but still efficient) dynamic programming maximisation algorithm. Our method enables the representation of value functions which cannot be expressed as a monotonic combination of per-agent functions, unlike past approaches such as VDN and QMIX. We implement a novel many-agent cooperative environment, Box Jump, and demonstrate improved performance over these baselines in this setting. We open-source our code and environment at https://github.com/zzbuzzard/PairVDN.

Paper Structure

This paper contains 13 sections, 8 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The three environments we evaluate on. Box Jump is a custom environment we design for this project.
  • Figure 2: Total episode rewards during training for the various trained models, with standard deviation across five episodes shown shaded. Black dashed line gives performance of a random baseline (averaged over 30 runs). PVDN denotes PairVDN.
  • Figure 3: Behaviour of VDN and PairVDN on Box Jump with 16 agents, no rotation, and seed zero in both cases. Notice how PairVDN's agents are grouped together more closely than VDN's at time $t=400$, and more so at time $t=1000$.
  • Figure 4: Example values of the left (a) and above (b) distance observations, where red indicates a value of zero, black a value of one, with a gradient between. Notice how the 'above' distance observation clearly indicates to the agent whether another agent is stacked on top of it (when the value is zero).
  • Figure 5: \ref{['fig:pairvdn']} but for seed 1 rather than seed 0, demonstrating that the figure was not cherry-picked but a frequent phenomenon.