Prism: Spectral Parameter Sharing for Multi-Agent Reinforcement Learning

Kyungbeom Kim; Seungwon Oh; Kyung-Joong Kim

Prism: Spectral Parameter Sharing for Multi-Agent Reinforcement Learning

Kyungbeom Kim, Seungwon Oh, Kyung-Joong Kim

TL;DR

Prism tackles scalability and policy heterogeneity in multi-agent reinforcement learning by learning in the spectral domain. It factorizes the shared weight matrix as $W = U \,\mathrm{diag}(s) \\mathbf{V}^T$ with common $U$ and $V$ across agents and agent-specific spectral masks on $s$, enabling diverse yet compact policies. The approach includes diversity and orthogonal regularization and demonstrates competitive or superior results to baselines on homogeneous (LBF, SMACv2) and heterogeneous (MaMuJoCo) tasks while reducing memory overhead. The work highlights that spectral-space sharing balances expressiveness and efficiency, particularly under resource constraints.

Abstract

Parameter sharing is a key strategy in multi-agent reinforcement learning (MARL) for improving scalability, yet conventional fully shared architectures often collapse into homogeneous behaviors. Recent methods introduce diversity through clustering, pruning, or masking, but typically compromise resource efficiency. We propose Prism, a parameter sharing framework that induces inter-agent diversity by representing shared networks in the spectral domain via singular value decomposition (SVD). All agents share the singular vector directions while learning distinct spectral masks on singular values. This mechanism encourages inter-agent diversity and preserves scalability. Extensive experiments on both homogeneous (LBF, SMACv2) and heterogeneous (MaMuJoCo) benchmarks show that Prism achieves competitive performance with superior resource efficiency.

Prism: Spectral Parameter Sharing for Multi-Agent Reinforcement Learning

TL;DR

Prism tackles scalability and policy heterogeneity in multi-agent reinforcement learning by learning in the spectral domain. It factorizes the shared weight matrix as

with common

and

across agents and agent-specific spectral masks on

, enabling diverse yet compact policies. The approach includes diversity and orthogonal regularization and demonstrates competitive or superior results to baselines on homogeneous (LBF, SMACv2) and heterogeneous (MaMuJoCo) tasks while reducing memory overhead. The work highlights that spectral-space sharing balances expressiveness and efficiency, particularly under resource constraints.

Abstract

Paper Structure (34 sections, 17 equations, 10 figures, 7 tables, 1 algorithm)

This paper contains 34 sections, 17 equations, 10 figures, 7 tables, 1 algorithm.

Introduction
Background
Spectral Parameter Sharing Formulation
Orthogonality to Reduce Interference and Redundancy.
Singular Value Decomposition as an Orthogonal Basis.
Method
SVD Decomposed Weight
Spectral Space Masking
Diversity and Orthogonal Regularization
Diversity Regularization.
Orthogonal Regularization.
Experiments
Experimental Setups
Performance Evaluation
Homogeneous Setting.
...and 19 more sections

Figures (10)

Figure 1: Overview of Prism framework. We decompose the shared weights $W\in\mathbb{R}^{d\times k}$ into $W=U\Sigma V^\top$ and learn them in the SVD-parameterized space. The left/right singular vectors $U$ and $V$ are shared among agents, whereas agent $i$ applies a learnable spectral mask to the singular values to obtain an agent-specific spectrum $\Sigma_i$, enabling diversity while preserving parameter sharing.
Figure 2: Performance evaluation on homogeneous environments (LBF, SMACv2).
Figure 3: Performance evaluation on heterogeneous environment (MaMuJoCo).
Figure 4: Performance under parameter budget constraints on SMACv2 and MaMuJoCo. Methods are evaluated with Full (matching NoPS parameters) and Half (50% of NoPS) budgets by adjusting model width for a fair comparison.
Figure 5: Resource efficiency evaluation with respect to the number of agents. Top: The total number of parameters and the additional resource overhead required by each method as the number of agents increases. Bottom: The normalized resource overhead, defined as the ratio of additional resources to the total model size (resource / (parameters + resource)).
...and 5 more figures

Prism: Spectral Parameter Sharing for Multi-Agent Reinforcement Learning

TL;DR

Abstract

Prism: Spectral Parameter Sharing for Multi-Agent Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (10)