Table of Contents
Fetching ...

SemPPL: Predicting pseudo-labels for better contrastive representations

Matko Bošnjak, Pierre H. Richemond, Nenad Tomasev, Florian Strub, Jacob C. Walker, Felix Hill, Lars Holger Buesing, Razvan Pascanu, Charles Blundell, Jovana Mitrovic

TL;DR

SemPPL introduces semantic positives via bootstrapped pseudo-labels to enrich positive sets in contrastive learning for semi-supervised image representation learning. By predicting pseudo-labels with a k-NN mechanism on labelled embeddings and sampling semantic positives from a label-assisted queue, SemPPL creates a reinforcing cycle that yields more semantically aligned representations. It achieves state-of-the-art results on ImageNet with 1% and 10% labels, improves robustness and OOD generalisation, and shows strong transfer performance across diverse datasets. The approach is compatible with multiple self-supervised losses and remains effective when scaling to larger architectures and with Selective Kernels. The work also provides extensive ablations, demonstrating the importance of pseudo-labels and semantic positives and the robustness to pseudo-label noise.

Abstract

Learning from large amounts of unsupervised data and a small amount of supervision is an important open problem in computer vision. We propose a new semi-supervised learning method, Semantic Positives via Pseudo-Labels (SemPPL), that combines labelled and unlabelled data to learn informative representations. Our method extends self-supervised contrastive learning -- where representations are shaped by distinguishing whether two samples represent the same underlying datum (positives) or not (negatives) -- with a novel approach to selecting positives. To enrich the set of positives, we leverage the few existing ground-truth labels to predict the missing ones through a $k$-nearest neighbours classifier by using the learned embeddings of the labelled data. We thus extend the set of positives with datapoints having the same pseudo-label and call these semantic positives. We jointly learn the representation and predict bootstrapped pseudo-labels. This creates a reinforcing cycle. Strong initial representations enable better pseudo-label predictions which then improve the selection of semantic positives and lead to even better representations. SemPPL outperforms competing semi-supervised methods setting new state-of-the-art performance of $68.5\%$ and $76\%$ top-$1$ accuracy when using a ResNet-$50$ and training on $1\%$ and $10\%$ of labels on ImageNet, respectively. Furthermore, when using selective kernels, SemPPL significantly outperforms previous state-of-the-art achieving $72.3\%$ and $78.3\%$ top-$1$ accuracy on ImageNet with $1\%$ and $10\%$ labels, respectively, which improves absolute $+7.8\%$ and $+6.2\%$ over previous work. SemPPL also exhibits state-of-the-art performance over larger ResNet models as well as strong robustness, out-of-distribution and transfer performance. We release the checkpoints and the evaluation code at https://github.com/deepmind/semppl .

SemPPL: Predicting pseudo-labels for better contrastive representations

TL;DR

SemPPL introduces semantic positives via bootstrapped pseudo-labels to enrich positive sets in contrastive learning for semi-supervised image representation learning. By predicting pseudo-labels with a k-NN mechanism on labelled embeddings and sampling semantic positives from a label-assisted queue, SemPPL creates a reinforcing cycle that yields more semantically aligned representations. It achieves state-of-the-art results on ImageNet with 1% and 10% labels, improves robustness and OOD generalisation, and shows strong transfer performance across diverse datasets. The approach is compatible with multiple self-supervised losses and remains effective when scaling to larger architectures and with Selective Kernels. The work also provides extensive ablations, demonstrating the importance of pseudo-labels and semantic positives and the robustness to pseudo-label noise.

Abstract

Learning from large amounts of unsupervised data and a small amount of supervision is an important open problem in computer vision. We propose a new semi-supervised learning method, Semantic Positives via Pseudo-Labels (SemPPL), that combines labelled and unlabelled data to learn informative representations. Our method extends self-supervised contrastive learning -- where representations are shaped by distinguishing whether two samples represent the same underlying datum (positives) or not (negatives) -- with a novel approach to selecting positives. To enrich the set of positives, we leverage the few existing ground-truth labels to predict the missing ones through a -nearest neighbours classifier by using the learned embeddings of the labelled data. We thus extend the set of positives with datapoints having the same pseudo-label and call these semantic positives. We jointly learn the representation and predict bootstrapped pseudo-labels. This creates a reinforcing cycle. Strong initial representations enable better pseudo-label predictions which then improve the selection of semantic positives and lead to even better representations. SemPPL outperforms competing semi-supervised methods setting new state-of-the-art performance of and top- accuracy when using a ResNet- and training on and of labels on ImageNet, respectively. Furthermore, when using selective kernels, SemPPL significantly outperforms previous state-of-the-art achieving and top- accuracy on ImageNet with and labels, respectively, which improves absolute and over previous work. SemPPL also exhibits state-of-the-art performance over larger ResNet models as well as strong robustness, out-of-distribution and transfer performance. We release the checkpoints and the evaluation code at https://github.com/deepmind/semppl .
Paper Structure (40 sections, 12 equations, 3 figures, 12 tables, 1 algorithm)

This paper contains 40 sections, 12 equations, 3 figures, 12 tables, 1 algorithm.

Figures (3)

  • Figure 1: Sketch of SemPPL. (Left) Standard contrastive pipelines. (Middle) Unlabelled data are tagged with pseudo-labels by using a $k$-NN over projected labelled data. (Right) Semantic positives are queried from the queue and processed to compute an additional contrastive loss.
  • Figure 2: Top-$1$ accuracy for ResNet50 with 100% of the labels across augmentations, initializations and networks.
  • Figure 3: Precision and recall for pseudo-labels computed based on $k$-nearest neighbours when trained on ImageNet with 10% labels over 100 epoches.