HyperMARL: Adaptive Hypernetworks for Multi-Agent RL
Kale-ab Abebe Tessera, Arrasy Rahman, Amos Storkey, Stefano V. Albrecht
TL;DR
HyperMARL introduces agent-conditioned hypernetworks to generate per-agent policy and critic weights, explicitly decoupling agent identity from observations to reduce cross-agent gradient interference in parameter-sharing MARL. This gradient decoupling enables adaptive behaviours—ranging from specialised to homogeneous—without altering learning objectives or requiring manual diversity tuning, and it yields lower policy gradient variance while preserving behavioural diversity. Empirical results across 22 MARL scenarios with up to 30 agents show competitive performance against strong baselines and robust specialization in heterogeneous tasks, while homogeneous tasks remain effectively handled, including recurrent settings. The work demonstrates that gradient decoupling via hypernetworks is a principled and scalable route to adaptive MARL, with practical implications for large-scale multi-agent systems where behavioural diversity is crucial.
Abstract
Adaptive cooperation in multi-agent reinforcement learning (MARL) requires policies to express homogeneous, specialised, or mixed behaviours, yet achieving this adaptivity remains a critical challenge. While parameter sharing (PS) is standard for efficient learning, it notoriously suppresses the behavioural diversity required for specialisation. This failure is largely due to cross-agent gradient interference, a problem we find is surprisingly exacerbated by the common practice of coupling agent IDs with observations. Existing remedies typically add complexity through altered objectives, manual preset diversity levels, or sequential updates -- raising a fundamental question: can shared policies adapt without these intricacies? We propose a solution built on a key insight: an agent-conditioned hypernetwork can generate agent-specific parameters and decouple observation- and agent-conditioned gradients, directly countering the interference from coupling agent IDs with observations. Our resulting method, HyperMARL, avoids the complexities of prior work and empirically reduces policy gradient variance. Across diverse MARL benchmarks (22 scenarios, up to 30 agents), HyperMARL achieves performance competitive with six key baselines while preserving behavioural diversity comparable to non-parameter sharing methods, establishing it as a versatile and principled approach for adaptive MARL. The code is publicly available at https://github.com/KaleabTessera/HyperMARL.
