Mixtures of Experts Unlock Parameter Scaling for Deep RL
Johan Obando-Ceron, Ghada Sokar, Timon Willi, Clare Lyle, Jesse Farebrother, Jakob Foerster, Gintare Karolina Dziugaite, Doina Precup, Pablo Samuel Castro
TL;DR
This paper demonstrates that integrating Soft Mixtures of Experts into value-based deep RL networks significantly improves parameter scalability, enabling larger models to perform better without destabilizing training. By replacing the penultimate layer with a Soft MoE, the authors observe consistent gains across DQN and Rainbow on extensive Atari benchmarks, with gains scaling with the number of experts and robustness to high replay ratios. They provide in-depth analyses of tokenization, gating, and encoder choices, showing that learned routing and the accompanying gating/combining components drive the improvements, while MoEs also stabilize optimization as evidenced by NTK rank and reduced dormant neurons. The work extends beyond online evaluation by showing promise in offline RL and low-data regimes, highlighting MoEs as a practical route toward establishing parameter-scale laws in reinforcement learning.
Abstract
The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance scales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-Expert (MoE) modules, and in particular Soft MoEs (Puigcerver et al., 2023), into value-based networks results in more parameter-scalable models, evidenced by substantial performance increases across a variety of training regimes and model sizes. This work thus provides strong empirical evidence towards developing scaling laws for reinforcement learning.
