PENEX: AdaBoost-Inspired Neural Network Regularization
Klaus-Rudolf Kladny, Bernhard Schölkopf, Michael Muehlebach
TL;DR
PENEX introduces a penalized exponential loss for neural networks, reformulating the multi-class exponential loss with a SumExp penalty to create a first-order-optimizable objective. The authors prove Fisher consistency and margin-maximization guarantees, and show that gradient descent on PENEX acts as an implicit AdaBoost-like procedure, effectively parameterizing weak learners. Empirically, PENEX frequently yields stronger regularization and better generalization than common techniques across computer vision and language tasks, especially in low-data and noisy-label scenarios, albeit with some convergence speed trade-offs and limited gains on very large datasets like ImageNet. The work positions PENEX as a practical AdaBoost-inspired regularizer with theoretical foundations and broad applicability to training and fine-tuning deep neural networks.
Abstract
AdaBoost sequentially fits so-called weak learners to minimize an exponential loss, which penalizes mislabeled data points more severely than other loss functions like cross-entropy. Paradoxically, AdaBoost generalizes well in practice as the number of weak learners grows. In the present work, we introduce Penalized Exponential Loss (PENEX), a new formulation of the multi-class exponential loss that is theoretically grounded and, in contrast to the existing formulation, amenable to optimization via first-order methods. We demonstrate both empirically and theoretically that PENEX implicitly maximizes margins of data points. Also, we show that gradient increments on PENEX implicitly parameterize weak learners in the boosting framework. Across computer vision and language tasks, we show that PENEX exhibits a regularizing effect often better than established methods with similar computational cost. Our results highlight PENEX's potential as an AdaBoost-inspired alternative for effective training and fine-tuning of deep neural networks.
