On Joint Regularization and Calibration in Deep Ensembles
Laurits Fredsgaard, Mikkel N. Schmidt
TL;DR
This work addresses how deep ensembles can be tuned more effectively by considering the ensemble as the primary objective rather than individual members. It introduces an ensemble-optimality framework and an overlapping holdout validation strategy to enable joint evaluation of weight decay, temperature scaling, and early stopping. Across image, graph, tabular, and text tasks, joint tuning often improves calibration and accuracy, though effects vary by task and metric; the overlapping holdout provides a practical compromise between data efficiency and joint evaluation. The results offer actionable guidance for practitioners on when and how to perform ensemble-aware optimization, and highlight initialization and validation choices as critical factors for robust, scalable deep ensembles.
Abstract
Deep ensembles are a powerful tool in machine learning, improving both model performance and uncertainty calibration. While ensembles are typically formed by training and tuning models individually, evidence suggests that jointly tuning the ensemble can lead to better performance. This paper investigates the impact of jointly tuning weight decay, temperature scaling, and early stopping on both predictive performance and uncertainty quantification. Additionally, we propose a partially overlapping holdout strategy as a practical compromise between enabling joint evaluation and maximizing the use of data for training. Our results demonstrate that jointly tuning the ensemble generally matches or improves performance, with significant variation in effect size across different tasks and metrics. We highlight the trade-offs between individual and joint optimization in deep ensemble training, with the overlapping holdout strategy offering an attractive practical solution. We believe our findings provide valuable insights and guidance for practitioners looking to optimize deep ensemble models. Code is available at: https://github.com/lauritsf/ensemble-optimality-gap
