Zero-shot Active Learning Using Self Supervised Learning
Abhishek Sinha, Shreya Singh
TL;DR
The paper tackles the data annotation bottleneck in deep learning by proposing a non-iterative, model-agnostic active learning method that uses self-supervised features to order unlabeled data under a fixed budget $B$. It trains a self-supervised encoder (e.g., MoCo-v2 or SimCLR) on unlabeled data, extracts fixed features for all samples, and applies a $k$-centre greedy algorithm to select a diverse labeling subset, avoiding iterative rounds. Evaluation on CIFAR-10 with Wide ResNet and CutOut shows that this approach outperforms random labeling and is competitive with Core-Set, particularly for budgets $B \ge 20{,}000$. The work demonstrates that self-supervised representations can enable effective non-iterative active learning that is agnostic to the downstream model, with potential applicability to larger datasets such as CIFAR-100 and ImageNet.
Abstract
Deep learning algorithms are often said to be data hungry. The performance of such algorithms generally improve as more and more annotated data is fed into the model. While collecting unlabelled data is easier (as they can be scraped easily from the internet), annotating them is a tedious and expensive task. Given a fixed budget available for data annotation, Active Learning helps selecting the best subset of data for annotation, such that the deep learning model when trained over that subset will have maximum generalization performance under this budget. In this work, we aim to propose a new Active Learning approach which is model agnostic as well as one doesn't require an iterative process. We aim to leverage self-supervised learnt features for the task of Active Learning. The benefit of self-supervised learning, is that one can get useful feature representation of the input data, without having any annotation.
