Table of Contents
Fetching ...

The Bad Batches: Enhancing Self-Supervised Learning in Image Classification Through Representative Batch Curation

Ozgu Goksu, Nicolas Pugeault

TL;DR

This paper attempts to alleviate the influence of false positive and false negative pairs by employing pairwise similarity calculations through the Fréchet ResNet Distance (FRD), thereby obtaining robust representations from unlabelled data.

Abstract

The pursuit of learning robust representations without human supervision is a longstanding challenge. The recent advancements in self-supervised contrastive learning approaches have demonstrated high performance across various representation learning challenges. However, current methods depend on the random transformation of training examples, resulting in some cases of unrepresentative positive pairs that can have a large impact on learning. This limitation not only impedes the convergence of the learning process but the robustness of the learnt representation as well as requiring larger batch sizes to improve robustness to such bad batches. This paper attempts to alleviate the influence of false positive and false negative pairs by employing pairwise similarity calculations through the Fréchet ResNet Distance (FRD), thereby obtaining robust representations from unlabelled data. The effectiveness of the proposed method is substantiated by empirical results, where a linear classifier trained on self-supervised contrastive representations achieved an impressive 87.74\% top-1 accuracy on STL10 and 99.31\% on the Flower102 dataset. These results emphasize the potential of the proposed approach in pushing the boundaries of the state-of-the-art in self-supervised contrastive learning, particularly for image classification tasks.

The Bad Batches: Enhancing Self-Supervised Learning in Image Classification Through Representative Batch Curation

TL;DR

This paper attempts to alleviate the influence of false positive and false negative pairs by employing pairwise similarity calculations through the Fréchet ResNet Distance (FRD), thereby obtaining robust representations from unlabelled data.

Abstract

The pursuit of learning robust representations without human supervision is a longstanding challenge. The recent advancements in self-supervised contrastive learning approaches have demonstrated high performance across various representation learning challenges. However, current methods depend on the random transformation of training examples, resulting in some cases of unrepresentative positive pairs that can have a large impact on learning. This limitation not only impedes the convergence of the learning process but the robustness of the learnt representation as well as requiring larger batch sizes to improve robustness to such bad batches. This paper attempts to alleviate the influence of false positive and false negative pairs by employing pairwise similarity calculations through the Fréchet ResNet Distance (FRD), thereby obtaining robust representations from unlabelled data. The effectiveness of the proposed method is substantiated by empirical results, where a linear classifier trained on self-supervised contrastive representations achieved an impressive 87.74\% top-1 accuracy on STL10 and 99.31\% on the Flower102 dataset. These results emphasize the potential of the proposed approach in pushing the boundaries of the state-of-the-art in self-supervised contrastive learning, particularly for image classification tasks.
Paper Structure (10 sections, 6 equations, 4 figures, 4 tables)

This paper contains 10 sections, 6 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Existing self-supervised contrastive methods mainly rely on various data augmentations to increase diversity, however, it causes weak transformed views of original images. Our method aims to eliminate weak augmented views such as darker images as a similar pair, and insufficient colour changes.
  • Figure 2: Our presented framework for batch curation in self-supervised contrastive learning. Task 1 illustrates image classification as a downstream task. The batch curation part mainly decides which batches are used to update gradients.
  • Figure 3: The illustration shows the impact of representation learning by our methods and SimCLR with only 30 epochs fine-tuning on several datasets. C represents CIFAR10, M represents MNIST, and S is for STL10 datasets. Ours-H has trained models only Huber loss, and Ours-F represents FRD batch curation without Huber loss. Ours is a combination of Huber loss and FRD.
  • Figure 4: FRD score shows the quality of each batch in a training process. Higher FRD scores represent bad batches which have many false positives between batches. In this figure, transformed views with FRD score of 1.29 demonstrate the poor augmentation between sea images.