Table of Contents
Fetching ...

RADIN: Souping on a Budget

Thibaut Menes, Olivier Risser-Maroix

TL;DR

RADIN addresses the computational bottleneck of model soups by approximating ensemble performance with averaged logits and establishing a first-order equivalence between ensemble and soup losses around initialization. It introduces a two-stage, budget-aware procedure that ranks candidate soups via fast logits-based evaluation and then selects the best by full validation, enabling flexible exploration budgets $B$ and improved performance at low budgets (up to $4\%$ on ImageNet). Theoretical foundations prove the equivalence of the first-order expansions and Monte-Carlo candidate generation demonstrates practical efficiency; experiments on CIFAR-10, ImageNet, and DomainNet show competitive performance and robustness to distribution shifts. Overall, RADIN provides a scalable, principled approach to soup crafting that adapts to resource constraints while offering gains over greedy baselines, especially in low-resource scenarios.

Abstract

Model Soups, extending Stochastic Weights Averaging (SWA), combine models fine-tuned with different hyperparameters. Yet, their adoption is hindered by computational challenges due to subset selection issues. In this paper, we propose to speed up model soups by approximating soups performance using averaged ensemble logits performances. Theoretical insights validate the congruence between ensemble logits and weight averaging soups across any mixing ratios. Our Resource ADjusted soups craftINg (RADIN) procedure stands out by allowing flexible evaluation budgets, enabling users to adjust his budget of exploration adapted to his resources while increasing performance at lower budget compared to previous greedy approach (up to 4% on ImageNet).

RADIN: Souping on a Budget

TL;DR

RADIN addresses the computational bottleneck of model soups by approximating ensemble performance with averaged logits and establishing a first-order equivalence between ensemble and soup losses around initialization. It introduces a two-stage, budget-aware procedure that ranks candidate soups via fast logits-based evaluation and then selects the best by full validation, enabling flexible exploration budgets and improved performance at low budgets (up to on ImageNet). Theoretical foundations prove the equivalence of the first-order expansions and Monte-Carlo candidate generation demonstrates practical efficiency; experiments on CIFAR-10, ImageNet, and DomainNet show competitive performance and robustness to distribution shifts. Overall, RADIN provides a scalable, principled approach to soup crafting that adapts to resource constraints while offering gains over greedy baselines, especially in low-resource scenarios.

Abstract

Model Soups, extending Stochastic Weights Averaging (SWA), combine models fine-tuned with different hyperparameters. Yet, their adoption is hindered by computational challenges due to subset selection issues. In this paper, we propose to speed up model soups by approximating soups performance using averaged ensemble logits performances. Theoretical insights validate the congruence between ensemble logits and weight averaging soups across any mixing ratios. Our Resource ADjusted soups craftINg (RADIN) procedure stands out by allowing flexible evaluation budgets, enabling users to adjust his budget of exploration adapted to his resources while increasing performance at lower budget compared to previous greedy approach (up to 4% on ImageNet).
Paper Structure (12 sections, 15 equations, 6 figures, 1 table, 2 algorithms)

This paper contains 12 sections, 15 equations, 6 figures, 1 table, 2 algorithms.

Figures (6)

  • Figure 1: Difference between model ensembling at logit level (a) and model souping (b). While the logits ensembling requires $O(N)$ full models inferences for a single image before averaging predicted logits, the soup alternative proposes to approximate the ensemble outputs in $O(1)$ by averaging before the $N$ weights of the ensemble into a single model.
  • Figure 2: Correlation between fast estimated performance on validation and real performance observed on test of 200 sampled random soups on Cifar 10. Soups with higher number of models tend to perform generally better than soups with lower number of models. The number of models involved in each soup candidate is indicated by the angle of the marker.
  • Figure 3: Soups candidates are ranked by their fast estimated performance. Lower rank indicate most promising candidates. One can observe that the fast approximation ranking allow to filter the poorly performing candidates. In the $\mathsf{RADIN}$ procedure, only the $B$ first candidates undergo full evaluation, from which the highest-performing soup is selected based on actual performance metrics.
  • Figure 4: Performance comparison of model soups on ImageNet at various budget levels $B$. While performances are comparable at higher budgets, $\mathsf{RADIN}$ outperforms greedy soups at reduced budgets. Notably, introducing the prior $\lambda$ marginally enhances the quality of soups identified at these lower budgets.
  • Figure 5: Performances distribution of model soups using high ($> 6$) and low ($\leq 6$) number of models on ImageNet.
  • ...and 1 more figures