Table of Contents
Fetching ...

Prism: Spectral Parameter Sharing for Multi-Agent Reinforcement Learning

Kyungbeom Kim, Seungwon Oh, Kyung-Joong Kim

TL;DR

Prism tackles scalability and policy heterogeneity in multi-agent reinforcement learning by learning in the spectral domain. It factorizes the shared weight matrix as $W = U \,\mathrm{diag}(s) \\mathbf{V}^T$ with common $U$ and $V$ across agents and agent-specific spectral masks on $s$, enabling diverse yet compact policies. The approach includes diversity and orthogonal regularization and demonstrates competitive or superior results to baselines on homogeneous (LBF, SMACv2) and heterogeneous (MaMuJoCo) tasks while reducing memory overhead. The work highlights that spectral-space sharing balances expressiveness and efficiency, particularly under resource constraints.

Abstract

Parameter sharing is a key strategy in multi-agent reinforcement learning (MARL) for improving scalability, yet conventional fully shared architectures often collapse into homogeneous behaviors. Recent methods introduce diversity through clustering, pruning, or masking, but typically compromise resource efficiency. We propose Prism, a parameter sharing framework that induces inter-agent diversity by representing shared networks in the spectral domain via singular value decomposition (SVD). All agents share the singular vector directions while learning distinct spectral masks on singular values. This mechanism encourages inter-agent diversity and preserves scalability. Extensive experiments on both homogeneous (LBF, SMACv2) and heterogeneous (MaMuJoCo) benchmarks show that Prism achieves competitive performance with superior resource efficiency.

Prism: Spectral Parameter Sharing for Multi-Agent Reinforcement Learning

TL;DR

Prism tackles scalability and policy heterogeneity in multi-agent reinforcement learning by learning in the spectral domain. It factorizes the shared weight matrix as with common and across agents and agent-specific spectral masks on , enabling diverse yet compact policies. The approach includes diversity and orthogonal regularization and demonstrates competitive or superior results to baselines on homogeneous (LBF, SMACv2) and heterogeneous (MaMuJoCo) tasks while reducing memory overhead. The work highlights that spectral-space sharing balances expressiveness and efficiency, particularly under resource constraints.

Abstract

Parameter sharing is a key strategy in multi-agent reinforcement learning (MARL) for improving scalability, yet conventional fully shared architectures often collapse into homogeneous behaviors. Recent methods introduce diversity through clustering, pruning, or masking, but typically compromise resource efficiency. We propose Prism, a parameter sharing framework that induces inter-agent diversity by representing shared networks in the spectral domain via singular value decomposition (SVD). All agents share the singular vector directions while learning distinct spectral masks on singular values. This mechanism encourages inter-agent diversity and preserves scalability. Extensive experiments on both homogeneous (LBF, SMACv2) and heterogeneous (MaMuJoCo) benchmarks show that Prism achieves competitive performance with superior resource efficiency.
Paper Structure (34 sections, 17 equations, 10 figures, 7 tables, 1 algorithm)

This paper contains 34 sections, 17 equations, 10 figures, 7 tables, 1 algorithm.

Figures (10)

  • Figure 1: Overview of Prism framework. We decompose the shared weights $W\in\mathbb{R}^{d\times k}$ into $W=U\Sigma V^\top$ and learn them in the SVD-parameterized space. The left/right singular vectors $U$ and $V$ are shared among agents, whereas agent $i$ applies a learnable spectral mask to the singular values to obtain an agent-specific spectrum $\Sigma_i$, enabling diversity while preserving parameter sharing.
  • Figure 2: Performance evaluation on homogeneous environments (LBF, SMACv2).
  • Figure 3: Performance evaluation on heterogeneous environment (MaMuJoCo).
  • Figure 4: Performance under parameter budget constraints on SMACv2 and MaMuJoCo. Methods are evaluated with Full (matching NoPS parameters) and Half (50% of NoPS) budgets by adjusting model width for a fair comparison.
  • Figure 5: Resource efficiency evaluation with respect to the number of agents. Top: The total number of parameters and the additional resource overhead required by each method as the number of agents increases. Bottom: The normalized resource overhead, defined as the ratio of additional resources to the total model size (resource / (parameters + resource)).
  • ...and 5 more figures