Table of Contents
Fetching ...

Semantic Positive Pairs for Enhancing Visual Representation Learning of Instance Discrimination Methods

Mohammad Alkhalefi, Georgios Leontidis, Mingjun Zhong

TL;DR

This work tackles the limitations of standard instance-discrimination SSL by leveraging semantic-aware positive pairs. It introduces SePP-ID, which uses a Semantic Sampler—combining a pre-trained SSL model and a similarity metric—to identify semantic positive pairs (SPPS) from the original dataset, forming richer supervisory signals. The SPPS are integrated with traditional positive pairs in a MoCo-v2–style loss, yielding improved representations that transfer better to downstream tasks; experiments on ImageNet, STL-10, and CIFAR-10 show consistent gains over state-of-the-art methods. The results demonstrate faster convergence and stronger downstream performance, validating the importance of accurate semantic pairing in contrastive learning without requiring labels.

Abstract

Self-supervised learning algorithms (SSL) based on instance discrimination have shown promising results, performing competitively or even outperforming supervised learning counterparts in some downstream tasks. Such approaches employ data augmentation to create two views of the same instance (i.e., positive pairs) and encourage the model to learn good representations by attracting these views closer in the embedding space without collapsing to the trivial solution. However, data augmentation is limited in representing positive pairs, and the repulsion process between the instances during contrastive learning may discard important features for instances that have similar categories. To address this issue, we propose an approach to identify those images with similar semantic content and treat them as positive instances, thereby reducing the chance of discarding important features during representation learning and increasing the richness of the latent representation. Our approach is generic and could work with any self-supervised instance discrimination frameworks such as MoCo and SimSiam. To evaluate our method, we run experiments on three benchmark datasets: ImageNet, STL-10 and CIFAR-10 with different instance discrimination SSL approaches. The experimental results show that our approach consistently outperforms the baseline methods across all three datasets; for instance, we improve upon the vanilla MoCo-v2 by 4.1% on ImageNet under a linear evaluation protocol over 800 epochs. We also report results on semi-supervised learning, transfer learning on downstream tasks, and object detection.

Semantic Positive Pairs for Enhancing Visual Representation Learning of Instance Discrimination Methods

TL;DR

This work tackles the limitations of standard instance-discrimination SSL by leveraging semantic-aware positive pairs. It introduces SePP-ID, which uses a Semantic Sampler—combining a pre-trained SSL model and a similarity metric—to identify semantic positive pairs (SPPS) from the original dataset, forming richer supervisory signals. The SPPS are integrated with traditional positive pairs in a MoCo-v2–style loss, yielding improved representations that transfer better to downstream tasks; experiments on ImageNet, STL-10, and CIFAR-10 show consistent gains over state-of-the-art methods. The results demonstrate faster convergence and stronger downstream performance, validating the importance of accurate semantic pairing in contrastive learning without requiring labels.

Abstract

Self-supervised learning algorithms (SSL) based on instance discrimination have shown promising results, performing competitively or even outperforming supervised learning counterparts in some downstream tasks. Such approaches employ data augmentation to create two views of the same instance (i.e., positive pairs) and encourage the model to learn good representations by attracting these views closer in the embedding space without collapsing to the trivial solution. However, data augmentation is limited in representing positive pairs, and the repulsion process between the instances during contrastive learning may discard important features for instances that have similar categories. To address this issue, we propose an approach to identify those images with similar semantic content and treat them as positive instances, thereby reducing the chance of discarding important features during representation learning and increasing the richness of the latent representation. Our approach is generic and could work with any self-supervised instance discrimination frameworks such as MoCo and SimSiam. To evaluate our method, we run experiments on three benchmark datasets: ImageNet, STL-10 and CIFAR-10 with different instance discrimination SSL approaches. The experimental results show that our approach consistently outperforms the baseline methods across all three datasets; for instance, we improve upon the vanilla MoCo-v2 by 4.1% on ImageNet under a linear evaluation protocol over 800 epochs. We also report results on semi-supervised learning, transfer learning on downstream tasks, and object detection.
Paper Structure (12 sections, 2 equations, 6 figures, 12 tables, 1 algorithm)

This paper contains 12 sections, 2 equations, 6 figures, 12 tables, 1 algorithm.

Figures (6)

  • Figure 1: Example of an instance discrimination task where positive pairs are attracted together and negative pairs are pushed apart, even if they have similar semantic content.
  • Figure 2: Similarity scores are shown for anchor (car) with instances from other classes using non-pre-trained models and random augmented images.
  • Figure 3: The proposed methodology: Firstly, $k$ images are chosen from the dataset and encoded by the pre-trained model; Secondly, a similarity metric is used to find the semantic positive pairs for each anchor, followed by data transformations applied to both the original dataset and the semantic positive pairs set. Eventually, all the images are combined in one dataset which will be used to train the instance discrimination model.
  • Figure 4: Illustrate the process of Semantic Sampler in identifying the semantic positive pairs.
  • Figure 5: Example of semantic positive pairs found by our approach for different anchors in the STL10-unlabeled dataset.
  • ...and 1 more figures