Efficient Adaptive Ensembling for Image Classification
Antonio Bruno, Davide Moroni, Massimo Martinelli
TL;DR
This work tackles the problem of improving image classification accuracy without large increases in model complexity. It introduces Efficient Adaptive Ensembling, which trains two EfficientNet-b0 weak learners on disjoint subsets (bagging) and fuses their representations with a trainable feature-level combiner, resulting in improved accuracy with far fewer parameters and FLOPs. The method achieves an average accuracy gain of $0.5\%$ while reducing parameters by $5$–$60$ times and FLOPs by $10$–$100$ times, demonstrating a practical path to greener, faster high-performance image classification. The approach also provides a framework for extending ensembling to other CV tasks like detection and segmentation, and invites exploration of alternative bagging strategies and fusion mechanisms.
Abstract
In recent times, with the exception of sporadic cases, the trend in Computer Vision is to achieve minor improvements compared to considerable increases in complexity. To reverse this trend, we propose a novel method to boost image classification performances without increasing complexity. To this end, we revisited ensembling, a powerful approach, often not used properly due to its more complex nature and the training time, so as to make it feasible through a specific design choice. First, we trained two EfficientNet-b0 end-to-end models (known to be the architecture with the best overall accuracy/complexity trade-off for image classification) on disjoint subsets of data (i.e. bagging). Then, we made an efficient adaptive ensemble by performing fine-tuning of a trainable combination layer. In this way, we were able to outperform the state-of-the-art by an average of 0.5$\%$ on the accuracy, with restrained complexity both in terms of the number of parameters (by 5-60 times), and the FLoating point Operations Per Second (FLOPS) by 10-100 times on several major benchmark datasets.
