Table of Contents
Fetching ...

Multi-Task Reinforcement Learning Enables Parameter Scaling

Reginald McLean, Evangelos Chatzaroulas, Jordan Terry, Isaac Woungang, Nariman Farsad, Pablo Samuel Castro

TL;DR

This paper investigates whether advances in multi-task reinforcement learning (MTRL) are primarily due to parameter scaling rather than architectural novelty. By benchmarking four MTRL architectures against a simple baseline scaled to equivalent parameter counts, the authors show that scaling can match or exceed the complex architectures, with critic scaling delivering the strongest gains. They further demonstrate that increasing the number of tasks mitigates plasticity loss, particularly at larger scales, suggesting a beneficial interaction between model capacity and task diversity. The findings imply that robust parameter scaling and critic-focused training may be more impactful than designing new architectures, and they provide an open-source baseline for future MTRL research.

Abstract

Multi-task reinforcement learning (MTRL) aims to endow a single agent with the ability to perform well on multiple tasks. Recent works have focused on developing novel sophisticated architectures to improve performance, often resulting in larger models; it is unclear, however, whether the performance gains are a consequence of the architecture design itself or the extra parameters. We argue that gains are mostly due to scale by demonstrating that naively scaling up a simple MTRL baseline to match parameter counts outperforms the more sophisticated architectures, and these gains benefit most from scaling the critic over the actor. Additionally, we explore the training stability advantages that come with task diversity, demonstrating that increasing the number of tasks can help mitigate plasticity loss. Our findings suggest that MTRL's simultaneous training across multiple tasks provides a natural framework for beneficial parameter scaling in reinforcement learning, challenging the need for complex architectural innovations.

Multi-Task Reinforcement Learning Enables Parameter Scaling

TL;DR

This paper investigates whether advances in multi-task reinforcement learning (MTRL) are primarily due to parameter scaling rather than architectural novelty. By benchmarking four MTRL architectures against a simple baseline scaled to equivalent parameter counts, the authors show that scaling can match or exceed the complex architectures, with critic scaling delivering the strongest gains. They further demonstrate that increasing the number of tasks mitigates plasticity loss, particularly at larger scales, suggesting a beneficial interaction between model capacity and task diversity. The findings imply that robust parameter scaling and critic-focused training may be more impactful than designing new architectures, and they provide an open-source baseline for future MTRL research.

Abstract

Multi-task reinforcement learning (MTRL) aims to endow a single agent with the ability to perform well on multiple tasks. Recent works have focused on developing novel sophisticated architectures to improve performance, often resulting in larger models; it is unclear, however, whether the performance gains are a consequence of the architecture design itself or the extra parameters. We argue that gains are mostly due to scale by demonstrating that naively scaling up a simple MTRL baseline to match parameter counts outperforms the more sophisticated architectures, and these gains benefit most from scaling the critic over the actor. Additionally, we explore the training stability advantages that come with task diversity, demonstrating that increasing the number of tasks can help mitigate plasticity loss. Our findings suggest that MTRL's simultaneous training across multiple tasks provides a natural framework for beneficial parameter scaling in reinforcement learning, challenging the need for complex architectural innovations.

Paper Structure

This paper contains 19 sections, 12 figures, 1 table.

Figures (12)

  • Figure 1: Effects of scaling the number of parameters of actor and critic in multi-task, multi-head, soft actor-critic pmlr-v100-yu20a, in the Meta-World (a) ten task and (b) fifty task benchmarks over ten random seeds. Y-axes report the inter-quartile mean. Shaded regions, for plots, and bars, for singular points, denote standard errors. MTRL specific methods are plotted here as singular points, while the Simple FF method is the result scaling a simple baseline to a similar number of parameters.
  • Figure 2: Sample tasks from Meta-World. (a) assembly, where the MTRL agent must grasp a wrench and place it on a peg. (b) door close, where the agent must swing the door from the current position to the green goal to close it. (c) coffee button, where the agent must press a button on the front of the coffee machine. (d) pick place, where the agent must grasp the red object and move it to the blue goal.
  • Figure 3: Visualization of architectures used in this work. (a) MTMHSAC from pmlr-v100-yu20a, (b) Soft-Modularization from yang_multi_task_soft_mod, (c) PaCo from sun2022paco, and (d) MOORE from hendawy2024multitask.
  • Figure 4: Extended Scaling Results. Each line is the average IQM produced by a feed-forward network scaled to a certain parameter count, with shaded regions indicating 95% CIs. Colours indicate which parameter count the model used. (a) MT10 scaling results, (b) MT50 scaling results.
  • Figure 5: Where does scale matter the most? Here we report the IQM through training across various actor & critic size configurations. Our baseline model uses equally sized actor & critics using layer width 1024. We then iterate over various actor & critic widths to determine which component benefits more from scale. These results show that the ability of the critic is more affected by the scaling of parameters, in line with results from nauman2024bigger in the single task setting.
  • ...and 7 more figures