Table of Contents
Fetching ...

Low-Rank Agent-Specific Adaptation (LoRASA) for Multi-Agent Policy Learning

Beining Zhang, Aditya Kapoor, Mingfei Sun

TL;DR

The paper tackles the inefficiency of fully shared policies in heterogeneous MARL by introducing LoRASA, which injects low-rank adapters into a shared policy backbone to enable agent-specific refinements. By treating each agent as a distinct task and applying rank-$r$ updates, LoRASA achieves a balance between the scalability of parameter sharing and the expressiveness of non-parameter sharing. The method comprises a two-phase training procedure—shared policy pretraining followed by LoRA-based fine-tuning—and supports integration with MAPPO and A2PO, yielding near-NPS performance with far lower resource cost. Empirical results on SMAC and MAMuJoCo demonstrate strong performance and notable resource efficiency, with ablations identifying practical guidelines for adapter rank, placement, and timing. This work offers a scalable framework for heterogeneous MARL that preserves coordination while enabling diverse agent behaviors, illustrating a promising direction for large-scale multi-agent systems.

Abstract

Multi-agent reinforcement learning (MARL) often relies on \emph{parameter sharing (PS)} to scale efficiently. However, purely shared policies can stifle each agent's unique specialization, reducing overall performance in heterogeneous environments. We propose \textbf{Low-Rank Agent-Specific Adaptation (LoRASA)}, a novel approach that treats each agent's policy as a specialized ``task'' fine-tuned from a shared backbone. Drawing inspiration from parameter-efficient transfer methods, LoRASA appends small, low-rank adaptation matrices to each layer of the shared policy, naturally inducing \emph{parameter-space sparsity} that promotes both specialization and scalability. We evaluate LoRASA on challenging benchmarks including the StarCraft Multi-Agent Challenge (SMAC) and Multi-Agent MuJoCo (MAMuJoCo), implementing it atop widely used algorithms such as MAPPO and A2PO. Across diverse tasks, LoRASA matches or outperforms existing baselines \emph{while reducing memory and computational overhead}. Ablation studies on adapter rank, placement, and timing validate the method's flexibility and efficiency. Our results suggest LoRASA's potential to establish a new norm for MARL policy parameterization: combining a shared foundation for coordination with low-rank agent-specific refinements for individual specialization.

Low-Rank Agent-Specific Adaptation (LoRASA) for Multi-Agent Policy Learning

TL;DR

The paper tackles the inefficiency of fully shared policies in heterogeneous MARL by introducing LoRASA, which injects low-rank adapters into a shared policy backbone to enable agent-specific refinements. By treating each agent as a distinct task and applying rank- updates, LoRASA achieves a balance between the scalability of parameter sharing and the expressiveness of non-parameter sharing. The method comprises a two-phase training procedure—shared policy pretraining followed by LoRA-based fine-tuning—and supports integration with MAPPO and A2PO, yielding near-NPS performance with far lower resource cost. Empirical results on SMAC and MAMuJoCo demonstrate strong performance and notable resource efficiency, with ablations identifying practical guidelines for adapter rank, placement, and timing. This work offers a scalable framework for heterogeneous MARL that preserves coordination while enabling diverse agent behaviors, illustrating a promising direction for large-scale multi-agent systems.

Abstract

Multi-agent reinforcement learning (MARL) often relies on \emph{parameter sharing (PS)} to scale efficiently. However, purely shared policies can stifle each agent's unique specialization, reducing overall performance in heterogeneous environments. We propose \textbf{Low-Rank Agent-Specific Adaptation (LoRASA)}, a novel approach that treats each agent's policy as a specialized ``task'' fine-tuned from a shared backbone. Drawing inspiration from parameter-efficient transfer methods, LoRASA appends small, low-rank adaptation matrices to each layer of the shared policy, naturally inducing \emph{parameter-space sparsity} that promotes both specialization and scalability. We evaluate LoRASA on challenging benchmarks including the StarCraft Multi-Agent Challenge (SMAC) and Multi-Agent MuJoCo (MAMuJoCo), implementing it atop widely used algorithms such as MAPPO and A2PO. Across diverse tasks, LoRASA matches or outperforms existing baselines \emph{while reducing memory and computational overhead}. Ablation studies on adapter rank, placement, and timing validate the method's flexibility and efficiency. Our results suggest LoRASA's potential to establish a new norm for MARL policy parameterization: combining a shared foundation for coordination with low-rank agent-specific refinements for individual specialization.

Paper Structure

This paper contains 48 sections, 1 theorem, 2 equations, 19 figures, 13 tables, 3 algorithms.

Key Result

Proposition 2.1

Assume that in a cooperative multi-agent reinforcement learning (MARL) setting, the agent-specific parameter deviations lie within or near an $r$-dimensional affine subspace of the full parameter space. Then, applying a rank-$r$ low-rank adaptation (LoRA) to the shared backbone's weights can approxi

Figures (19)

  • Figure 1: Overview of LoRASA framework.
  • Figure 2: Performance comparison of different parameter sharing approaches (PS, NPS, SePS, MTL, PS+LoRA and SePS+LoRA) using A2PO (row1, A--H) and MAPPO (row2, I--P) across four MAMuJoCo and SMAC scenarios: Half Cheetah 2x3, Walker 3x2, Ant 4x2, Humanoid $9|8$, 3s5z, 1c3s5z, 3s5z_vs_3s6z, and MMM2. The graphs plot median episode returns and evaluation win rates versus environment steps for each approach for MAMujoco and SMAC respectively. Half Cheetah 2x3 and Humanoid $9|8$ has two agents so we do not have SePS and SePS+LoRA. MAPPO learning style struggles with Humanoid $9|8$ irrespective of the parameter sharing framework.
  • Figure 3: Computational Efficiency of LoRASA Compared to Baselines. (1) Memory footprint across environments: Total trainable parameters for each baseline in MAMuJoCo and SMAC, highlighting LoRASA’s efficiency over NPS. (2) Scalability with agent count: Growth in trainable parameters as the number of agents increases, showing LoRASA scales efficiently while NPS grows linearly. (3) Training and inference speed in MAMuJoCo: LoRASA-based approaches significantly reduce computational time compared to NPS while achieving comparable or superior performance. (4) Training and inference speed in SMAC: Similar trends observed in SMAC, where LoRASA improves computational efficiency without compromising coordination quality.
  • Figure 4: Ablation studies on A2PO and MAPPO algorithms in Ant4x2 and MMM2 environments. (A--D) Timing of LoRA Fine-Tuning: Evaluates checkpoints starting at different environment steps versus the Parameter Sharing (PS) baseline. (E--H) LoRA Rank $r$: Assesses the impact of varying $r$ values at 4, 8, 16, and 64 (full rank) compared to the PS baseline. (I--L) Layer-Wise LoRA: Compares the effect of applying LoRA selectively to different layers of the policy including applying LoRA to all layers simultaneously. Each subplot displays median episode returns and win rates over environment steps for MAMujoco and SMAC respectively, demonstrating LoRASA’s effectiveness in learning heterogeneous behaviors while balancing efficiency and expressivity.
  • Figure 5: Evaluation episode rewards of (A--D) A2PO and (E--H) MAPPO in SMAC scenarios
  • ...and 14 more figures

Theorems & Definitions (1)

  • Proposition 2.1