Table of Contents
Fetching ...

The Empirical Impact of Reducing Symmetries on the Performance of Deep Ensembles and MoE

Andrei Chernov, Oleg Novitskij

TL;DR

The paper tackles how neural network parameter space symmetries affect ensemble performance, focusing on reducing symmetries via weight freezing (WMLP). It evaluates Deep Ensembles and MoE on five datasets and introduces Mixture of Interpolated Experts (MoIE) to explore deeper linear mode connectivity. Empirically, deep ensembles based on WMLP show meaningful gains as ensemble size increases, even when individual models are not superior to their symmetric counterparts, while evidence for MoE and MOIE benefits remains inconclusive. The work suggests symmetry reduction can improve practical ensemble performance and identifies MoIE as a promising, but not yet conclusively superior, approach for MoE-based ensembles. These findings have practical implications for designing robust ensemble systems, especially in tabular data scenarios, and point to future work to stabilize MoE/MOIE gains through regularization and setup refinements.

Abstract

Recent studies have shown that reducing symmetries in neural networks enhances linear mode connectivity between networks without requiring parameter space alignment, leading to improved performance in linearly interpolated neural networks. However, in practical applications, neural network interpolation is rarely used; instead, ensembles of networks are more common. In this paper, we empirically investigate the impact of reducing symmetries on the performance of deep ensembles and Mixture of Experts (MoE) across five datasets. Additionally, to explore deeper linear mode connectivity, we introduce the Mixture of Interpolated Experts (MoIE). Our results show that deep ensembles built on asymmetric neural networks achieve significantly better performance as ensemble size increases compared to their symmetric counterparts. In contrast, our experiments do not provide conclusive evidence on whether reducing symmetries affects both MoE and MoIE architectures.

The Empirical Impact of Reducing Symmetries on the Performance of Deep Ensembles and MoE

TL;DR

The paper tackles how neural network parameter space symmetries affect ensemble performance, focusing on reducing symmetries via weight freezing (WMLP). It evaluates Deep Ensembles and MoE on five datasets and introduces Mixture of Interpolated Experts (MoIE) to explore deeper linear mode connectivity. Empirically, deep ensembles based on WMLP show meaningful gains as ensemble size increases, even when individual models are not superior to their symmetric counterparts, while evidence for MoE and MOIE benefits remains inconclusive. The work suggests symmetry reduction can improve practical ensemble performance and identifies MoIE as a promising, but not yet conclusively superior, approach for MoE-based ensembles. These findings have practical implications for designing robust ensemble systems, especially in tabular data scenarios, and point to future work to stabilize MoE/MOIE gains through regularization and setup refinements.

Abstract

Recent studies have shown that reducing symmetries in neural networks enhances linear mode connectivity between networks without requiring parameter space alignment, leading to improved performance in linearly interpolated neural networks. However, in practical applications, neural network interpolation is rarely used; instead, ensembles of networks are more common. In this paper, we empirically investigate the impact of reducing symmetries on the performance of deep ensembles and Mixture of Experts (MoE) across five datasets. Additionally, to explore deeper linear mode connectivity, we introduce the Mixture of Interpolated Experts (MoIE). Our results show that deep ensembles built on asymmetric neural networks achieve significantly better performance as ensemble size increases compared to their symmetric counterparts. In contrast, our experiments do not provide conclusive evidence on whether reducing symmetries affects both MoE and MoIE architectures.

Paper Structure

This paper contains 18 sections, 1 equation, 4 figures, 1 table, 1 algorithm.

Figures (4)

  • Figure 1: Deep ensembles’ relative improvement in performance. The graphics depicts the relative improvement in performance of both MLP and WMLP models compared to a single MLP and WMLP neural network, respectively.
  • Figure 2: MoE and MoIE relative improvement. In these graphics, MLP represents MoE with vanilla MLP experts, WMLP denotes MoE with WMLP experts, IMLP corresponds to MoIE with vanilla MLP experts, and IWMLP refers to MoIE with WMLP experts. The relative improvement of all models is shown in comparison to their corresponding model architectures with two experts.
  • Figure 3: Deep ensemble absolute metrics.
  • Figure 4: MoE/MoIE absolute metrics.