Table of Contents
Fetching ...

All models are wrong, some are useful: Model Selection with Limited Labels

Patrik Okanovic, Andreas Kirsch, Jannes Kasper, Torsten Hoefler, Andreas Krause, Nezihe Merve Gürel

TL;DR

Through extensive experiments, it is demonstrated that MODEL SELECTOR drastically reduces the need for labeled data while consistently picking the best or near-best performing model.

Abstract

We introduce MODEL SELECTOR, a framework for label-efficient selection of pretrained classifiers. Given a pool of unlabeled target data, MODEL SELECTOR samples a small subset of highly informative examples for labeling, in order to efficiently identify the best pretrained model for deployment on this target dataset. Through extensive experiments, we demonstrate that MODEL SELECTOR drastically reduces the need for labeled data while consistently picking the best or near-best performing model. Across 18 model collections on 16 different datasets, comprising over 1,500 pretrained models, MODEL SELECTOR reduces the labeling cost by up to 94.15% to identify the best model compared to the cost of the strongest baseline. Our results further highlight the robustness of MODEL SELECTOR in model selection, as it reduces the labeling cost by up to 72.41% when selecting a near-best model, whose accuracy is only within 1% of the best model.

All models are wrong, some are useful: Model Selection with Limited Labels

TL;DR

Through extensive experiments, it is demonstrated that MODEL SELECTOR drastically reduces the need for labeled data while consistently picking the best or near-best performing model.

Abstract

We introduce MODEL SELECTOR, a framework for label-efficient selection of pretrained classifiers. Given a pool of unlabeled target data, MODEL SELECTOR samples a small subset of highly informative examples for labeling, in order to efficiently identify the best pretrained model for deployment on this target dataset. Through extensive experiments, we demonstrate that MODEL SELECTOR drastically reduces the need for labeled data while consistently picking the best or near-best performing model. Across 18 model collections on 16 different datasets, comprising over 1,500 pretrained models, MODEL SELECTOR reduces the labeling cost by up to 94.15% to identify the best model compared to the cost of the strongest baseline. Our results further highlight the robustness of MODEL SELECTOR in model selection, as it reduces the labeling cost by up to 72.41% when selecting a near-best model, whose accuracy is only within 1% of the best model.

Paper Structure

This paper contains 32 sections, 7 equations, 9 figures, 7 tables, 1 algorithm.

Figures (9)

  • Figure 1: An overview of our label-efficient model selection pipeline with Model Selector. Given a pool of $n$ unlabeled data examples and a set of $m$ pretrained classifiers, Model Selector aims to select $b$ (with $b\ll n$) unlabeled examples that, once labeled, can identify the best pretrained model.
  • Figure 2: Best model identification probability of Model Selector and the baselines on $18$ model collections. Model Selector is capable of reducing the labeling cost by up to $94.15\%$ for identifying the best model.
  • Figure 3: Accuracies of the models used in \ref{['sec:experiments']}, evaluated on the entire dataset. Our experiments cover different scenarios across a wide range of model accuracies.
  • Figure 4: Best model identification probability of Model Selector for $\epsilon \in \{0.35, 0.40, 0.45, 0.49, 0.50\}$ on $18$ model collections using oracle labels.
  • Figure 5: Best model identification probability of Model Selector for $\epsilon \in \{0.35, 0.40, 0.45, 0.49, 0.50\}$ on $18$ model collections using noisy labels.
  • ...and 4 more figures