Network Fission Ensembles for Low-Cost Self-Ensembles
Hojung Lee, Jong-Seok Lee
TL;DR
This work tackles the high cost of ensemble methods by introducing Network Fission Ensembles (NFE), which converts a single network into a multi-exit architecture through weight pruning and grouping, enabling ensemble-like predictions without additional models. During training, NFE uses ensemble knowledge distillation, treating the outputs from all exits as a joint teacher to guide learning, with the ensemble logits $z_E = \frac{1}{N}\sum_i z_i$ and teacher probabilities $q_E = \text{softmax}(z_E / T)$. The method demonstrates strong performance on CIFAR-100 and Tiny ImageNet with ResNet and Wide-ResNet backbones, achieving higher accuracy than Deep Ensembles and other low-cost ensembles while keeping FLOPs close to a single model; results are robust to moderate sparsity via PaI methods and balanced weight grouping. Overall, NFE provides a practical, scalable pathway to high-accuracy ensemble-like behavior at near-zero additional computational cost, with potential extensions to other computer vision tasks and further optimizations for exit scalability.
Abstract
Recent ensemble learning methods for image classification have been shown to improve classification accuracy with low extra cost. However, they still require multiple trained models for ensemble inference, which eventually becomes a significant burden when the model size increases. In this paper, we propose a low-cost ensemble learning and inference, called Network Fission Ensembles (NFE), by converting a conventional network itself into a multi-exit structure. Starting from a given initial network, we first prune some of the weights to reduce the training burden. We then group the remaining weights into several sets and create multiple auxiliary paths using each set to construct multi-exits. We call this process Network Fission. Through this, multiple outputs can be obtained from a single network, which enables ensemble learning. Since this process simply changes the existing network structure to multi-exits without using additional networks, there is no extra computational burden for ensemble learning and inference. Moreover, by learning from multiple losses of all exits, the multi-exits improve performance via regularization, and high performance can be achieved even with increased network sparsity. With our simple yet effective method, we achieve significant improvement compared to existing ensemble methods. The code is available at https://github.com/hjdw2/NFE.
