PairVDN - Pair-wise Decomposed Value Functions

Zak Buzzard

PairVDN - Pair-wise Decomposed Value Functions

Zak Buzzard

TL;DR

The paper tackles cooperative multi-agent reinforcement learning by addressing the expressivity limits of monotonic value decompositions like VDN and QMIX. It introduces PairVDN, a non-monotonic pairwise decomposition of the joint Q-function $Q((o_1,...,o_n),(a_1,...,a_n))$, expressed as $\sum_{i=1}^n \tilde{Q}_{i,i+1}((o_i,o_{i+1}),(a_i,a_{i+1}))$, with a dynamic programming solver achieving $O(n|\mathcal{A}|^3)$ maximisation. The method uses pairwise networks $\tilde{Q}_{i,i+1}$ that take two observations and output $|\mathcal{A}|^2$ values, allowing richer inter-agent interactions while maintaining tractable optimization; experiments on Box Jump with many agents show improved coordination over baselines, especially at longer horizons, though more complex environments like Cooking Zoo remain challenging for the simple DQN baseline. Overall, PairVDN broadens the expressiveness of multi-agent value decompositions and demonstrates tangible gains in coordination in large-scale cooperative MARL, with open-source code for reproduction. The approach highlights the limitations of monotonic decompositions and suggests directions for extending the framework to more flexible interaction graphs.

Abstract

Extending deep Q-learning to cooperative multi-agent settings is challenging due to the exponential growth of the joint action space, the non-stationary environment, and the credit assignment problem. Value decomposition allows deep Q-learning to be applied at the joint agent level, at the cost of reduced expressivity. Building on past work in this direction, our paper proposes PairVDN, a novel method for decomposing the value function into a collection of pair-wise, rather than per-agent, functions, improving expressivity at the cost of requiring a more complex (but still efficient) dynamic programming maximisation algorithm. Our method enables the representation of value functions which cannot be expressed as a monotonic combination of per-agent functions, unlike past approaches such as VDN and QMIX. We implement a novel many-agent cooperative environment, Box Jump, and demonstrate improved performance over these baselines in this setting. We open-source our code and environment at https://github.com/zzbuzzard/PairVDN.

PairVDN - Pair-wise Decomposed Value Functions

TL;DR

Abstract

PairVDN - Pair-wise Decomposed Value Functions

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)