ASPEST: Bridging the Gap Between Active Learning and Selective Prediction

Jiefeng Chen; Jinsung Yoon; Sayna Ebrahimi; Sercan Arik; Somesh Jha; Tomas Pfister

ASPEST: Bridging the Gap Between Active Learning and Selective Prediction

Jiefeng Chen, Jinsung Yoon, Sayna Ebrahimi, Sercan Arik, Somesh Jha, Tomas Pfister

TL;DR

This work introduces a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain while increasing accuracy and coverage and proposes a simple yet effective approach, ASPEST, that utilizes ensembles of model snapshots with self-training with their aggregated outputs as pseudo labels.

Abstract

Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain. These predictions can then be deferred to humans for further evaluation. As an everlasting challenge for machine learning, in many real-world scenarios, the distribution of test data is different from the training data. This results in more inaccurate predictions, and often increased dependence on humans, which can be difficult and expensive. Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples. Selective prediction and active learning have been approached from different angles, with the connection between them missing. In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain while increasing accuracy and coverage. For this new paradigm, we propose a simple yet effective approach, ASPEST, that utilizes ensembles of model snapshots with self-training with their aggregated outputs as pseudo labels. Extensive experiments on numerous image, text and structured datasets, which suffer from domain shifts, demonstrate that ASPEST can significantly outperform prior work on selective prediction and active learning (e.g. on the MNIST$\to$SVHN benchmark with the labeling budget of 100, ASPEST improves the AUACC metric from 79.36% to 88.84%) and achieves more optimal utilization of humans in the loop.

ASPEST: Bridging the Gap Between Active Learning and Selective Prediction

TL;DR

Abstract

SVHN benchmark with the labeling budget of 100, ASPEST improves the AUACC metric from 79.36% to 88.84%) and achieves more optimal utilization of humans in the loop.

Paper Structure (36 sections, 28 equations, 6 figures, 33 tables, 3 algorithms)

This paper contains 36 sections, 28 equations, 6 figures, 33 tables, 3 algorithms.

Introduction
Related Work
Active Selective Prediction
Problem Setup
Evaluation Metrics
Challenges
Proposed Method: ASPEST
Experiments
Setup
Results
Analyses and Discussions
Conclusion
More Related Work
Evaluation Metrics
Baselines
...and 21 more sections

Figures (6)

Figure 1: Illustration of the distribution shift problem and the proposed active selective prediction solution. Under distribution shift, the model trained on the source training dataset will suffer a large performance degradation on the unlabeled test dataset. We propose to use active selective prediction to solve this problem, where active learning is used to improve selective prediction under distribution shift. The selective predictor shown in Fig. (b) is built upon the source-trained model depicted in Fig. (a). In this setting, active learning selects a small subset of data for labeling which are used to improve selective prediction on the remaining unlabeled test data. This yields more reliable predictions and more optimized use of humans in the loop.
Figure 2: Illustration of the challenges in active selective prediction using a linear model to maximize the margin (distance to the decision boundary) for binary classification. We consider a single-round active learning setup, without the inclusion of already labeled data. The current decision boundary is derived from labeled source training data. Since we focus on assessing performance with respect to the unlabeled test data, where our evaluation metrics are applied, we omit source training data from this illustration. The model confidence is considered to be proportional to the margin (when the margin is larger, the confidence is higher and vice versa). Fig. (a) shows if the samples close to the current decision boundary are selected for labeling (uncertainty-based sample selection), then the adapted model suffers from the overconfidence issue (mis-classification with high confidence), which results in acceptance of some mis-classified points. Fig. (b) shows if diverse samples are selected for labeling (e.g., using the k-Center-Greedy algorithm sener2017active), then the adapted model suffers from low accuracy. This leads to rejection of many points, necessitating significant human intervention.
Figure 3: Illustration of the checkpoint ensemble and pseudo-labeled set construction in the proposed ASPEST.
Figure 4: Evaluating the median confidence of the model on the correctly classified and mis-classified test data respectively when fine-tuning the model on the selected labeled test data.
Figure 5: Evaluating the checkpoints during fine-tuning and the checkpoint ensemble constructed after fine-tuning on the target test dataset.
...and 1 more figures

ASPEST: Bridging the Gap Between Active Learning and Selective Prediction

TL;DR

Abstract

ASPEST: Bridging the Gap Between Active Learning and Selective Prediction

Authors

TL;DR

Abstract

Table of Contents

Figures (6)