The Empirical Impact of Reducing Symmetries on the Performance of Deep Ensembles and MoE
Andrei Chernov, Oleg Novitskij
TL;DR
The paper tackles how neural network parameter space symmetries affect ensemble performance, focusing on reducing symmetries via weight freezing (WMLP). It evaluates Deep Ensembles and MoE on five datasets and introduces Mixture of Interpolated Experts (MoIE) to explore deeper linear mode connectivity. Empirically, deep ensembles based on WMLP show meaningful gains as ensemble size increases, even when individual models are not superior to their symmetric counterparts, while evidence for MoE and MOIE benefits remains inconclusive. The work suggests symmetry reduction can improve practical ensemble performance and identifies MoIE as a promising, but not yet conclusively superior, approach for MoE-based ensembles. These findings have practical implications for designing robust ensemble systems, especially in tabular data scenarios, and point to future work to stabilize MoE/MOIE gains through regularization and setup refinements.
Abstract
Recent studies have shown that reducing symmetries in neural networks enhances linear mode connectivity between networks without requiring parameter space alignment, leading to improved performance in linearly interpolated neural networks. However, in practical applications, neural network interpolation is rarely used; instead, ensembles of networks are more common. In this paper, we empirically investigate the impact of reducing symmetries on the performance of deep ensembles and Mixture of Experts (MoE) across five datasets. Additionally, to explore deeper linear mode connectivity, we introduce the Mixture of Interpolated Experts (MoIE). Our results show that deep ensembles built on asymmetric neural networks achieve significantly better performance as ensemble size increases compared to their symmetric counterparts. In contrast, our experiments do not provide conclusive evidence on whether reducing symmetries affects both MoE and MoIE architectures.
