Multi-Task Reinforcement Learning Enables Parameter Scaling
Reginald McLean, Evangelos Chatzaroulas, Jordan Terry, Isaac Woungang, Nariman Farsad, Pablo Samuel Castro
TL;DR
This paper investigates whether advances in multi-task reinforcement learning (MTRL) are primarily due to parameter scaling rather than architectural novelty. By benchmarking four MTRL architectures against a simple baseline scaled to equivalent parameter counts, the authors show that scaling can match or exceed the complex architectures, with critic scaling delivering the strongest gains. They further demonstrate that increasing the number of tasks mitigates plasticity loss, particularly at larger scales, suggesting a beneficial interaction between model capacity and task diversity. The findings imply that robust parameter scaling and critic-focused training may be more impactful than designing new architectures, and they provide an open-source baseline for future MTRL research.
Abstract
Multi-task reinforcement learning (MTRL) aims to endow a single agent with the ability to perform well on multiple tasks. Recent works have focused on developing novel sophisticated architectures to improve performance, often resulting in larger models; it is unclear, however, whether the performance gains are a consequence of the architecture design itself or the extra parameters. We argue that gains are mostly due to scale by demonstrating that naively scaling up a simple MTRL baseline to match parameter counts outperforms the more sophisticated architectures, and these gains benefit most from scaling the critic over the actor. Additionally, we explore the training stability advantages that come with task diversity, demonstrating that increasing the number of tasks can help mitigate plasticity loss. Our findings suggest that MTRL's simultaneous training across multiple tasks provides a natural framework for beneficial parameter scaling in reinforcement learning, challenging the need for complex architectural innovations.
