Table of Contents
Fetching ...

Rethinking Positive Pairs in Contrastive Learning

Jiantao Wu, Sara Atito, Zhenhua Feng, Shentong Mo, Josef Kitler, Muhammad Awais

TL;DR

Rethinking Positive Pairs in Contrastive Learning introduces SimLAP, a universal contrastive learning framework that learns visual representations from arbitrary class pairs by discovering pair-specific subspaces. It employs a feature filter to generate gates, creating subspace activations so that the contrastive loss operates only on shared features, with an additional Gate Penalty to encourage binary-like gates. The method integrates InfoNCE within the subspaces and demonstrates strong transfer performance across six tasks, scalable results with ViT architectures, and resilience against dimensional collapse, corroborated by embedding visualizations and Grad-CAM analyses. This approach broadens the design space of positives in contrastive learning, enabling robust, transferable representations for large, diverse label sets, though it faces challenges in interpretability and scaling to very large label spaces due to quadratic pair growth.

Abstract

The training methods in AI do involve semantically distinct pairs of samples. However, their role typically is to enhance the between class separability. The actual notion of similarity is normally learned from semantically identical pairs. This paper presents SimLAP: a simple framework for learning visual representation from arbitrary pairs. SimLAP explores the possibility of learning similarity from semantically distinct sample pairs. The approach is motivated by the observation that for any pair of classes there exists a subspace in which semantically distinct samples exhibit similarity. This phenomenon can be exploited for a novel method of learning, which optimises the similarity of an arbitrary pair of samples, while simultaneously learning the enabling subspace. The feasibility of the approach will be demonstrated experimentally and its merits discussed.

Rethinking Positive Pairs in Contrastive Learning

TL;DR

Rethinking Positive Pairs in Contrastive Learning introduces SimLAP, a universal contrastive learning framework that learns visual representations from arbitrary class pairs by discovering pair-specific subspaces. It employs a feature filter to generate gates, creating subspace activations so that the contrastive loss operates only on shared features, with an additional Gate Penalty to encourage binary-like gates. The method integrates InfoNCE within the subspaces and demonstrates strong transfer performance across six tasks, scalable results with ViT architectures, and resilience against dimensional collapse, corroborated by embedding visualizations and Grad-CAM analyses. This approach broadens the design space of positives in contrastive learning, enabling robust, transferable representations for large, diverse label sets, though it faces challenges in interpretability and scaling to very large label spaces due to quadratic pair growth.

Abstract

The training methods in AI do involve semantically distinct pairs of samples. However, their role typically is to enhance the between class separability. The actual notion of similarity is normally learned from semantically identical pairs. This paper presents SimLAP: a simple framework for learning visual representation from arbitrary pairs. SimLAP explores the possibility of learning similarity from semantically distinct sample pairs. The approach is motivated by the observation that for any pair of classes there exists a subspace in which semantically distinct samples exhibit similarity. This phenomenon can be exploited for a novel method of learning, which optimises the similarity of an arbitrary pair of samples, while simultaneously learning the enabling subspace. The feasibility of the approach will be demonstrated experimentally and its merits discussed.

Paper Structure

This paper contains 47 sections, 6 equations, 15 figures, 5 tables, 1 algorithm.

Figures (15)

  • Figure 1: Similarity distribution of class pairs under the subspaces of feature extracted by SimCLR and SimLAP for snake-lamp. We can hardly find any common visual features from the example of a garter snake and a table lamp, However, we find that the 500 dimensions with the lowest variance from SimCLR's representation can separate snake-lamp from other classes. SimLAP learns such a subspace while representation learning.
  • Figure 2: Universal contrastive learning for arbitrary class pairs. The feature filter generates a gate vector to activate the common features for the given class pair by averaging their label embeddings. SimLAP learns visual representations by maximizing the agreement of common features between disparate samples (Hydra-Lamp) in the corresponding subspace.
  • Figure 3: SimLAP prevents dimensional collapse and benefits from longer training. Trained on IN1P and evaluated on CIFAR10. Lower singular values suggest that the learned representations are concentrating information in fewer dimensions, that is, dimensional collapse.
  • Figure 4: Similarity distribution for SimLAP, Supcon, and SimCLR. The number in the middle denotes the overlap of two distributions. Smaller value means better class-separation.
  • Figure 5: $t$-SNE visualization of learned embeddings for two contrasting groups: 5 dog breeds and 5 snake species. Large markers indicate class centers, with stars representing dog classes and points representing snake classes. While classes remain well-separated in the global space (abc), SimLAP can selectively bring disparate classes closer in their designated subspace (d) through learned feature filtering.
  • ...and 10 more figures