Table of Contents
Fetching ...

GPS-SSL: Guided Positive Sampling to Inject Prior Into Self-Supervised Learning

Aarash Feizi, Randall Balestriero, Adriana Romero-Soriano, Reihaneh Rabbany

TL;DR

SSL performance hinges on carefully designed data augmentations to form positive pairs. GPS-SSL replaces augmentation-centric sampling with nearest-neighbor positives in a designed embedding space, defined by $B(oldsymbol{x}) = \{ \boldsymbol{x}' : \| g_{\gamma}(\boldsymbol{x}) - g_{\gamma}(\boldsymbol{x}') \|_2^2 < \tau \} \)$ and $\boldsymbol{x}' = \arg\max_{\boldsymbol{u} \in B(\boldsymbol{x})} \| g_{\gamma}(\boldsymbol{u}) - g_{\gamma}(\boldsymbol{x}) \|_2^2$, pairing with $DA(\boldsymbol{x})$ to form PositivePairs_GPS. This framework subsumes NNCLR as a special case and enables injecting priors via pretrained or handcrafted embeddings $g_{\gamma}$, reducing dependence on strong data augmentations. The approach is compatible with multiple SSL methods (SimCLR, BYOL, NNCLR, VICReg) and supports various priors (CLIP, VAE, MAE); experiments on under-studied datasets (e.g., FGVCAircraft, PathMNIST, TissueMNIST) and the real-world R-HID hotel dataset show substantial gains, including notable improvements on CIFAR-10 with weak augmentations. Overall, GPS-SSL broadens SSL applicability by providing a principled, prior-informed alternative to augmentation design, with practical impact on real-world data domains.

Abstract

We propose Guided Positive Sampling Self-Supervised Learning (GPS-SSL), a general method to inject a priori knowledge into Self-Supervised Learning (SSL) positive samples selection. Current SSL methods leverage Data-Augmentations (DA) for generating positive samples and incorporate prior knowledge - an incorrect, or too weak DA will drastically reduce the quality of the learned representation. GPS-SSL proposes instead to design a metric space where Euclidean distances become a meaningful proxy for semantic relationship. In that space, it is now possible to generate positive samples from nearest neighbor sampling. Any prior knowledge can now be embedded into that metric space independently from the employed DA. From its simplicity, GPS-SSL is applicable to any SSL method, e.g. SimCLR or BYOL. A key benefit of GPS-SSL is in reducing the pressure in tailoring strong DAs. For example GPS-SSL reaches 85.58% on Cifar10 with weak DA while the baseline only reaches 37.51%. We therefore move a step forward towards the goal of making SSL less reliant on DA. We also show that even when using strong DAs, GPS-SSL outperforms the baselines on under-studied domains. We evaluate GPS-SSL along with multiple baseline SSL methods on numerous downstream datasets from different domains when the models use strong or minimal data augmentations. We hope that GPS-SSL will open new avenues in studying how to inject a priori knowledge into SSL in a principled manner.

GPS-SSL: Guided Positive Sampling to Inject Prior Into Self-Supervised Learning

TL;DR

SSL performance hinges on carefully designed data augmentations to form positive pairs. GPS-SSL replaces augmentation-centric sampling with nearest-neighbor positives in a designed embedding space, defined by and , pairing with to form PositivePairs_GPS. This framework subsumes NNCLR as a special case and enables injecting priors via pretrained or handcrafted embeddings , reducing dependence on strong data augmentations. The approach is compatible with multiple SSL methods (SimCLR, BYOL, NNCLR, VICReg) and supports various priors (CLIP, VAE, MAE); experiments on under-studied datasets (e.g., FGVCAircraft, PathMNIST, TissueMNIST) and the real-world R-HID hotel dataset show substantial gains, including notable improvements on CIFAR-10 with weak augmentations. Overall, GPS-SSL broadens SSL applicability by providing a principled, prior-informed alternative to augmentation design, with practical impact on real-world data domains.

Abstract

We propose Guided Positive Sampling Self-Supervised Learning (GPS-SSL), a general method to inject a priori knowledge into Self-Supervised Learning (SSL) positive samples selection. Current SSL methods leverage Data-Augmentations (DA) for generating positive samples and incorporate prior knowledge - an incorrect, or too weak DA will drastically reduce the quality of the learned representation. GPS-SSL proposes instead to design a metric space where Euclidean distances become a meaningful proxy for semantic relationship. In that space, it is now possible to generate positive samples from nearest neighbor sampling. Any prior knowledge can now be embedded into that metric space independently from the employed DA. From its simplicity, GPS-SSL is applicable to any SSL method, e.g. SimCLR or BYOL. A key benefit of GPS-SSL is in reducing the pressure in tailoring strong DAs. For example GPS-SSL reaches 85.58% on Cifar10 with weak DA while the baseline only reaches 37.51%. We therefore move a step forward towards the goal of making SSL less reliant on DA. We also show that even when using strong DAs, GPS-SSL outperforms the baselines on under-studied domains. We evaluate GPS-SSL along with multiple baseline SSL methods on numerous downstream datasets from different domains when the models use strong or minimal data augmentations. We hope that GPS-SSL will open new avenues in studying how to inject a priori knowledge into SSL in a principled manner.
Paper Structure (15 sections, 2 theorems, 5 equations, 4 figures, 9 tables)

This paper contains 15 sections, 2 theorems, 5 equations, 4 figures, 9 tables.

Key Result

Proposition 1

For any employed DA, GPS-SSL which replaces eq:pospair by eq:GPS in any SSL loss (eq:SSL) recovers (i) input space nearest neighbor positive sampling when $g_{\gamma}$ is the identity and $\tau \gg 0$, (ii) standard SSL when $g_{\gamma}$ is a bijection and $\tau \rightarrow 0$, and (iii) NNCLR when

Figures (4)

  • Figure 1: Our strategy, GPS-SSL, for positive sampling based on prior knowledge DA-based methods.
  • Figure 2: An example (a) StrongAug and (b) RHFlipAug applied to an image from the FGVCAircraft dataset. Furthermore, (c) and (d) depict examples of the 4 nearest neighors calculated by CLIP and VAE embeddings, respectively.
  • Figure 3: Architectures of SimCLR, NNCLR, and GPS-SimCLR. This figure demonstrates where the data augmentaiton (DA) happens in each method and also how the nearest neighbor (NN) search is different between NNCLR and GPS-SimCLR. Note that the 'queue' in NNCLR has a limited size, usually set to 65536. This issue could lead to under-represented classes to not be learned efficiently.
  • Figure 4: Comparing the runtime of BYOL vs. GPS-BYOL and SimCLR vs. GPS-SimCLR for two datasets, i.e., FGVCAircraft and Cifar10. In general, we see while the runtime of GPS-SSL remains the same as the original baseline SSL method, it improves the performance.

Theorems & Definitions (2)

  • Proposition 1
  • Theorem 1