Table of Contents
Fetching ...

SUPClust: Active Learning at the Boundaries

Yuta Ono, Till Aczel, Benjamin Estermann, Roger Wattenhofer

TL;DR

This work proposes a novel active learning method called SUPClust that seeks to identify points at the decision boundary between classes by targeting these points, and demonstrates experimentally that labeling these points leads to strong model performance.

Abstract

Active learning is a machine learning paradigm designed to optimize model performance in a setting where labeled data is expensive to acquire. In this work, we propose a novel active learning method called SUPClust that seeks to identify points at the decision boundary between classes. By targeting these points, SUPClust aims to gather information that is most informative for refining the model's prediction of complex decision regions. We demonstrate experimentally that labeling these points leads to strong model performance. This improvement is observed even in scenarios characterized by strong class imbalance.

SUPClust: Active Learning at the Boundaries

TL;DR

This work proposes a novel active learning method called SUPClust that seeks to identify points at the decision boundary between classes by targeting these points, and demonstrates experimentally that labeling these points leads to strong model performance.

Abstract

Active learning is a machine learning paradigm designed to optimize model performance in a setting where labeled data is expensive to acquire. In this work, we propose a novel active learning method called SUPClust that seeks to identify points at the decision boundary between classes. By targeting these points, SUPClust aims to gather information that is most informative for refining the model's prediction of complex decision regions. We demonstrate experimentally that labeling these points leads to strong model performance. This improvement is observed even in scenarios characterized by strong class imbalance.
Paper Structure (8 sections, 3 equations, 8 figures)

This paper contains 8 sections, 3 equations, 8 figures.

Figures (8)

  • Figure 1: Decision boundary of an SVM classifier.
  • Figure 2: Distribution of classes within each cluster on SimCLR embeddings for CIFAR-10. Cluster boundaries align with category boundaries.
  • Figure 3: t-SNE plots of 100 queried instances by TypiClust and SUPClust (ours) in the CIFAR-10 embedding space. Colors represent the categories. For clusters on the "edge" of the data distribution, SUPClust tends to select samples that are closer to other clusters in the embedding space.
  • Figure 4: Relationship between typicality and SUP on CIFAR-10 on 4 randomly selected clusters, with temperature $1$. Typicality and SUP have no strong correlation, using both metrics to select instances can improve the querying strategy.
  • Figure 5: Ablation study on ISIC-2019 with budget=8 and with self-supervised embeddings
  • ...and 3 more figures