Mixture of Experts in a Mixture of RL settings
Timon Willi, Johan Obando-Ceron, Jakob Foerster, Karolina Dziugaite, Pablo Samuel Castro
TL;DR
Problem: DRL under non-stationarity often suffers from plasticity loss and inefficient parameter usage. Approach: this paper evaluates Mixtures of Experts (MoEs) across multi-task RL (MTRL) and continual RL (CRL) with various MoE architectures and routing strategies, using PPO on multiple Atari-like MinAtar tasks. Key findings: SoftMoE with a Big architecture reduces dormant neurons, improves learning under CRL, and partially benefits MTRL; router learning shows mixed results with hardcoded routing sometimes outperforming learned routing, and environment order significantly affecting CRL. Significance: provides practical guidelines for integrating MoEs in actor-critic DRL and suggests curricula or multi-agent MoE extensions as fruitful future directions.
Abstract
Mixtures of Experts (MoEs) have gained prominence in (self-)supervised learning due to their enhanced inference efficiency, adaptability to distributed training, and modularity. Previous research has illustrated that MoEs can significantly boost Deep Reinforcement Learning (DRL) performance by expanding the network's parameter count while reducing dormant neurons, thereby enhancing the model's learning capacity and ability to deal with non-stationarity. In this work, we shed more light on MoEs' ability to deal with non-stationarity and investigate MoEs in DRL settings with "amplified" non-stationarity via multi-task training, providing further evidence that MoEs improve learning capacity. In contrast to previous work, our multi-task results allow us to better understand the underlying causes for the beneficial effect of MoE in DRL training, the impact of the various MoE components, and insights into how best to incorporate them in actor-critic-based DRL networks. Finally, we also confirm results from previous work.
