Zero-shot Active Learning Using Self Supervised Learning

Abhishek Sinha; Shreya Singh

Zero-shot Active Learning Using Self Supervised Learning

Abhishek Sinha, Shreya Singh

TL;DR

The paper tackles the data annotation bottleneck in deep learning by proposing a non-iterative, model-agnostic active learning method that uses self-supervised features to order unlabeled data under a fixed budget $B$. It trains a self-supervised encoder (e.g., MoCo-v2 or SimCLR) on unlabeled data, extracts fixed features for all samples, and applies a $k$-centre greedy algorithm to select a diverse labeling subset, avoiding iterative rounds. Evaluation on CIFAR-10 with Wide ResNet and CutOut shows that this approach outperforms random labeling and is competitive with Core-Set, particularly for budgets $B \ge 20{,}000$. The work demonstrates that self-supervised representations can enable effective non-iterative active learning that is agnostic to the downstream model, with potential applicability to larger datasets such as CIFAR-100 and ImageNet.

Abstract

Deep learning algorithms are often said to be data hungry. The performance of such algorithms generally improve as more and more annotated data is fed into the model. While collecting unlabelled data is easier (as they can be scraped easily from the internet), annotating them is a tedious and expensive task. Given a fixed budget available for data annotation, Active Learning helps selecting the best subset of data for annotation, such that the deep learning model when trained over that subset will have maximum generalization performance under this budget. In this work, we aim to propose a new Active Learning approach which is model agnostic as well as one doesn't require an iterative process. We aim to leverage self-supervised learnt features for the task of Active Learning. The benefit of self-supervised learning, is that one can get useful feature representation of the input data, without having any annotation.

Zero-shot Active Learning Using Self Supervised Learning

TL;DR

. It trains a self-supervised encoder (e.g., MoCo-v2 or SimCLR) on unlabeled data, extracts fixed features for all samples, and applies a

-centre greedy algorithm to select a diverse labeling subset, avoiding iterative rounds. Evaluation on CIFAR-10 with Wide ResNet and CutOut shows that this approach outperforms random labeling and is competitive with Core-Set, particularly for budgets

. The work demonstrates that self-supervised representations can enable effective non-iterative active learning that is agnostic to the downstream model, with potential applicability to larger datasets such as CIFAR-100 and ImageNet.

Abstract

Paper Structure (15 sections, 4 figures)

This paper contains 15 sections, 4 figures.

Introduction
Related Work
Active Learning
Self-Supervised Learning
Momentum Contrastive Learning chen2020simple10.1007/978-981-10-8639-7_25.
Datasets
Method
Experiment Details
Baselines
Active Learning Using Self-Supervised Features (Our Approach)
Evaluation
Results
Discussion
Conclusion and Future Work
Contributions

Figures (4)

Figure 1: Schematic Overview of Momentum Contrastive Learning Method
Figure 2: Performance of different approaches
Figure 3: Frequency of classes in the 20k data points selected for labelling for the Baselines
Figure 4: Frequency of classes in the 20k data points selected for labelling for the Baselines

Zero-shot Active Learning Using Self Supervised Learning

TL;DR

Abstract

Zero-shot Active Learning Using Self Supervised Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (4)