Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL
Ghada Sokar, Johan Obando-Ceron, Aaron Courville, Hugo Larochelle, Pablo Samuel Castro
TL;DR
The paper investigates why SoftMoEs improve online reinforcement learning, revealing that tokenizing the convolutional encoder outputs—rather than simply increasing the number of experts—is the dominant factor behind performance gains. Through a series of controlled experiments, it shows that combined tokenization preserves spatial structure and can match or exceed the benefits of multiple experts, even with a single scaled expert. These findings challenge the default practice of flattening encoder outputs and suggest broader implications for pixel-based RL architectures and expert utilization strategies. The work demonstrates robustness across multiple agents, encoders, and environments, highlighting tokenization as a key design principle for scalable, efficient RL with MoEs and guiding future research toward better utilization of expert capacity.
Abstract
The use of deep neural networks in reinforcement learning (RL) often suffers from performance degradation as model size increases. While soft mixtures of experts (SoftMoEs) have recently shown promise in mitigating this issue for online RL, the reasons behind their effectiveness remain largely unknown. In this work we provide an in-depth analysis identifying the key factors driving this performance gain. We discover the surprising result that tokenizing the encoder output, rather than the use of multiple experts, is what is behind the efficacy of SoftMoEs. Indeed, we demonstrate that even with an appropriately scaled single expert, we are able to maintain the performance gains, largely thanks to tokenization.
