Table of Contents
Fetching ...

SAE: Single Architecture Ensemble Neural Networks

Martin Ferianc, Hongxiang Fan, Miguel Rodrigues

TL;DR

The paper addresses hardware-efficient neural network ensembles by proposing SAE, a framework that automatically searches over early exits and multi-input multi-output configurations within a single architecture. SAE combines a scalable search space that generalizes EE, MIMO, MIMMO and in-between configurations with an optimization objective based on variational inference to learn both network weights and per-input exit depth distributions. Through multi-task experiments on TinyImageNet, BloodMNIST, PneumoniaMNIST, and RetinaMNIST with diverse backbones, SAE achieves competitive accuracy and calibration while reducing FLOPs and parameter counts by up to $1.5\sim 3.7\times$ relative to baselines. The results demonstrate there is no universal best configuration and that automatic search yields diverse, task-dependent configurations, offering practical hardware-efficiency benefits and a flexible framework for architecture design.

Abstract

Ensembles of separate neural networks (NNs) have shown superior accuracy and confidence calibration over single NN across tasks. To improve the hardware efficiency of ensembles of separate NNs, recent methods create ensembles within a single network via adding early exits or considering multi input multi output approaches. However, it is unclear which of these methods is the most effective for a given task, needing a manual and separate search through each method. Our novel Single Architecture Ensemble (SAE) framework enables an automatic and joint search through the early exit and multi input multi output configurations and their previously unobserved in-between combinations. SAE consists of two parts: a scalable search space that generalises the previous methods and their in-between configurations, and an optimisation objective that allows learning the optimal configuration for a given task. Our image classification and regression experiments show that with SAE we can automatically find diverse configurations that fit the task, achieving competitive accuracy or confidence calibration to baselines while reducing the compute operations or parameter count by up to $1.5{\sim}3.7\times$.

SAE: Single Architecture Ensemble Neural Networks

TL;DR

The paper addresses hardware-efficient neural network ensembles by proposing SAE, a framework that automatically searches over early exits and multi-input multi-output configurations within a single architecture. SAE combines a scalable search space that generalizes EE, MIMO, MIMMO and in-between configurations with an optimization objective based on variational inference to learn both network weights and per-input exit depth distributions. Through multi-task experiments on TinyImageNet, BloodMNIST, PneumoniaMNIST, and RetinaMNIST with diverse backbones, SAE achieves competitive accuracy and calibration while reducing FLOPs and parameter counts by up to relative to baselines. The results demonstrate there is no universal best configuration and that automatic search yields diverse, task-dependent configurations, offering practical hardware-efficiency benefits and a flexible framework for architecture design.

Abstract

Ensembles of separate neural networks (NNs) have shown superior accuracy and confidence calibration over single NN across tasks. To improve the hardware efficiency of ensembles of separate NNs, recent methods create ensembles within a single network via adding early exits or considering multi input multi output approaches. However, it is unclear which of these methods is the most effective for a given task, needing a manual and separate search through each method. Our novel Single Architecture Ensemble (SAE) framework enables an automatic and joint search through the early exit and multi input multi output configurations and their previously unobserved in-between combinations. SAE consists of two parts: a scalable search space that generalises the previous methods and their in-between configurations, and an optimisation objective that allows learning the optimal configuration for a given task. Our image classification and regression experiments show that with SAE we can automatically find diverse configurations that fit the task, achieving competitive accuracy or confidence calibration to baselines while reducing the compute operations or parameter count by up to .
Paper Structure (44 sections, 10 equations, 17 figures, 17 tables, 1 algorithm)

This paper contains 44 sections, 10 equations, 17 figures, 17 tables, 1 algorithm.

Figures (17)

  • Figure 1: The Single Architecture Ensemble (SAE). The filled rectangles stand for learnable layers $\{f^j(\cdot)\}^D_{j=1}$ and prediction heads $\{h^j(\cdot)\}^D_{j=1}$, while the empty rectangle represents a non-parametric operation. The $D$ is the network depth, $N$ is the number of separate inputs in the ensemble, and $K$ is the maximum number of active exits during evaluation for each input. The $\{x_i\}_{i=1}^N$ are the $N$ inputs and $\{\hat{y}_i^j\}^{N,D}_{i,j=1}$ represent the predictions from the $D$ exits for $N$ inputs. The arrows represent the flow of information. The dashed and greyed boxes and arrows represent exits that were active during training but inactive during evaluation because of top $K$ exits identified during training.
  • Figure 2: Comparison on ID test sets, with respect to Standard NN $\newmoon$, NN Ensemble $\blacksquare$, SAE: I/B: $N\geq 2, 2 \leq K < D$, EE: $N=1, K \geq 2$, MIMMO: $N\geq 2, K=D$, MIMO: $N\geq 2, K=1$, SE NN: $N=1, K=1$, MCD $\blacktriangle$, BE $\blacktriangleleft$. The black outlines denote the configurations compared in the text.
  • Figure 3: Varying $N,K$ on across ID test sets. The upper number is the average performance over $N$, $K$ combinations. The number in brackets is the number of sampled configurations by HPO. White box means no configurations sampled for that $N$, $K$. Pattern signals best average performance. The coloured outlines signal the special cases for the generalised methods.
  • Figure 4: Depth preference during training when averaging all $N$ and $K$ for all datasets. The lines denote the mean trend, and the shaded regions denote the standard deviation across configurations.
  • Figure 5: Comparison on ID and OOD test sets for TinyImageNet, with respect to Standard NN $\newmoon$, NN Ensemble $\blacksquare$, SAE: I/B: $N\geq 2, 2 \leq K < D$, EE: $N=1, K \geq 2$, MIMMO: $N\geq 2, K=D$, MIMO: $N\geq 2, K=1$, SE NN: $N=1, K=1$, MCD $\blacktriangle$, BE $\blacktriangleleft$. The black outlines denote the configurations compared in the text.
  • ...and 12 more figures