SemPPL: Predicting pseudo-labels for better contrastive representations

Matko Bošnjak; Pierre H. Richemond; Nenad Tomasev; Florian Strub; Jacob C. Walker; Felix Hill; Lars Holger Buesing; Razvan Pascanu; Charles Blundell; Jovana Mitrovic

SemPPL: Predicting pseudo-labels for better contrastive representations

Matko Bošnjak, Pierre H. Richemond, Nenad Tomasev, Florian Strub, Jacob C. Walker, Felix Hill, Lars Holger Buesing, Razvan Pascanu, Charles Blundell, Jovana Mitrovic

TL;DR

SemPPL introduces semantic positives via bootstrapped pseudo-labels to enrich positive sets in contrastive learning for semi-supervised image representation learning. By predicting pseudo-labels with a k-NN mechanism on labelled embeddings and sampling semantic positives from a label-assisted queue, SemPPL creates a reinforcing cycle that yields more semantically aligned representations. It achieves state-of-the-art results on ImageNet with 1% and 10% labels, improves robustness and OOD generalisation, and shows strong transfer performance across diverse datasets. The approach is compatible with multiple self-supervised losses and remains effective when scaling to larger architectures and with Selective Kernels. The work also provides extensive ablations, demonstrating the importance of pseudo-labels and semantic positives and the robustness to pseudo-label noise.

Abstract

Learning from large amounts of unsupervised data and a small amount of supervision is an important open problem in computer vision. We propose a new semi-supervised learning method, Semantic Positives via Pseudo-Labels (SemPPL), that combines labelled and unlabelled data to learn informative representations. Our method extends self-supervised contrastive learning -- where representations are shaped by distinguishing whether two samples represent the same underlying datum (positives) or not (negatives) -- with a novel approach to selecting positives. To enrich the set of positives, we leverage the few existing ground-truth labels to predict the missing ones through a $k$-nearest neighbours classifier by using the learned embeddings of the labelled data. We thus extend the set of positives with datapoints having the same pseudo-label and call these semantic positives. We jointly learn the representation and predict bootstrapped pseudo-labels. This creates a reinforcing cycle. Strong initial representations enable better pseudo-label predictions which then improve the selection of semantic positives and lead to even better representations. SemPPL outperforms competing semi-supervised methods setting new state-of-the-art performance of $68.5\%$ and $76\%$ top-$1$ accuracy when using a ResNet-$50$ and training on $1\%$ and $10\%$ of labels on ImageNet, respectively. Furthermore, when using selective kernels, SemPPL significantly outperforms previous state-of-the-art achieving $72.3\%$ and $78.3\%$ top-$1$ accuracy on ImageNet with $1\%$ and $10\%$ labels, respectively, which improves absolute $+7.8\%$ and $+6.2\%$ over previous work. SemPPL also exhibits state-of-the-art performance over larger ResNet models as well as strong robustness, out-of-distribution and transfer performance. We release the checkpoints and the evaluation code at https://github.com/deepmind/semppl .

SemPPL: Predicting pseudo-labels for better contrastive representations

TL;DR

Abstract

-nearest neighbours classifier by using the learned embeddings of the labelled data. We thus extend the set of positives with datapoints having the same pseudo-label and call these semantic positives. We jointly learn the representation and predict bootstrapped pseudo-labels. This creates a reinforcing cycle. Strong initial representations enable better pseudo-label predictions which then improve the selection of semantic positives and lead to even better representations. SemPPL outperforms competing semi-supervised methods setting new state-of-the-art performance of

and

top-

accuracy when using a ResNet-

and training on

and

of labels on ImageNet, respectively. Furthermore, when using selective kernels, SemPPL significantly outperforms previous state-of-the-art achieving

and

top-

accuracy on ImageNet with

and

labels, respectively, which improves absolute

and

over previous work. SemPPL also exhibits state-of-the-art performance over larger ResNet models as well as strong robustness, out-of-distribution and transfer performance. We release the checkpoints and the evaluation code at https://github.com/deepmind/semppl .

Paper Structure (40 sections, 12 equations, 3 figures, 12 tables, 1 algorithm)

This paper contains 40 sections, 12 equations, 3 figures, 12 tables, 1 algorithm.

Introduction
Semantic Positives via Pseudo-Labels
Augmentation positives
Pseudo-label prediction and semantic positives
Implementation details
Experimental Results
Semi-supervised learning
Robustness and OOD generalisation
Transfer learning
Full labelled dataset
Analysis
Semantic positives across self-supervised learning objectives
The contribution of pseudo-labels and semantic positives
Precision and Recall of pseudo-labels.
Noise in pseudo-label prediction
...and 25 more sections

Figures (3)

Figure 1: Sketch of SemPPL. (Left) Standard contrastive pipelines. (Middle) Unlabelled data are tagged with pseudo-labels by using a $k$-NN over projected labelled data. (Right) Semantic positives are queried from the queue and processed to compute an additional contrastive loss.
Figure 2: Top-$1$ accuracy for ResNet50 with 100% of the labels across augmentations, initializations and networks.
Figure 3: Precision and recall for pseudo-labels computed based on $k$-nearest neighbours when trained on ImageNet with 10% labels over 100 epoches.

SemPPL: Predicting pseudo-labels for better contrastive representations

TL;DR

Abstract

SemPPL: Predicting pseudo-labels for better contrastive representations

Authors

TL;DR

Abstract

Table of Contents

Figures (3)