Table of Contents
Fetching ...

Self-Supervised Learning with Gaussian Processes

Yunshan Duan, Sinead Williamson

TL;DR

GPSSL introduces a Gaussian process prior on representations to enforce smoothness without relying on positive/negative pairs, enabling uncertainty-aware self-supervised learning. It formulates a generalized Bayesian posterior via generalized variational inference, adopts VICReg-style variance and covariance losses (without an explicit invariance term), and links GPSSL to kernel PCA and VICReg. Empirical results show GPSSL achieves competitive or superior performance on tabular and real-world data while providing meaningful uncertainty quantification in downstream tasks and in out-of-sample regions. The framework is particularly suited for structured data such as tabular, graphs, and spatial transcriptomics, where uncertainty in representations can be propagated to predictions and decision making.

Abstract

Self supervised learning (SSL) is a machine learning paradigm where models learn to understand the underlying structure of data without explicit supervision from labeled samples. The acquired representations from SSL have demonstrated useful for many downstream tasks including clustering, and linear classification, etc. To ensure smoothness of the representation space, most SSL methods rely on the ability to generate pairs of observations that are similar to a given instance. However, generating these pairs may be challenging for many types of data. Moreover, these methods lack consideration of uncertainty quantification and can perform poorly in out-of-sample prediction settings. To address these limitations, we propose Gaussian process self supervised learning (GPSSL), a novel approach that utilizes Gaussian processes (GP) models on representation learning. GP priors are imposed on the representations, and we obtain a generalized Bayesian posterior minimizing a loss function that encourages informative representations. The covariance function inherent in GPs naturally pulls representations of similar units together, serving as an alternative to using explicitly defined positive samples. We show that GPSSL is closely related to both kernel PCA and VICReg, a popular neural network-based SSL method, but unlike both allows for posterior uncertainties that can be propagated to downstream tasks. Experiments on various datasets, considering classification and regression tasks, demonstrate that GPSSL outperforms traditional methods in terms of accuracy, uncertainty quantification, and error control.

Self-Supervised Learning with Gaussian Processes

TL;DR

GPSSL introduces a Gaussian process prior on representations to enforce smoothness without relying on positive/negative pairs, enabling uncertainty-aware self-supervised learning. It formulates a generalized Bayesian posterior via generalized variational inference, adopts VICReg-style variance and covariance losses (without an explicit invariance term), and links GPSSL to kernel PCA and VICReg. Empirical results show GPSSL achieves competitive or superior performance on tabular and real-world data while providing meaningful uncertainty quantification in downstream tasks and in out-of-sample regions. The framework is particularly suited for structured data such as tabular, graphs, and spatial transcriptomics, where uncertainty in representations can be propagated to predictions and decision making.

Abstract

Self supervised learning (SSL) is a machine learning paradigm where models learn to understand the underlying structure of data without explicit supervision from labeled samples. The acquired representations from SSL have demonstrated useful for many downstream tasks including clustering, and linear classification, etc. To ensure smoothness of the representation space, most SSL methods rely on the ability to generate pairs of observations that are similar to a given instance. However, generating these pairs may be challenging for many types of data. Moreover, these methods lack consideration of uncertainty quantification and can perform poorly in out-of-sample prediction settings. To address these limitations, we propose Gaussian process self supervised learning (GPSSL), a novel approach that utilizes Gaussian processes (GP) models on representation learning. GP priors are imposed on the representations, and we obtain a generalized Bayesian posterior minimizing a loss function that encourages informative representations. The covariance function inherent in GPs naturally pulls representations of similar units together, serving as an alternative to using explicitly defined positive samples. We show that GPSSL is closely related to both kernel PCA and VICReg, a popular neural network-based SSL method, but unlike both allows for posterior uncertainties that can be propagated to downstream tasks. Experiments on various datasets, considering classification and regression tasks, demonstrate that GPSSL outperforms traditional methods in terms of accuracy, uncertainty quantification, and error control.

Paper Structure

This paper contains 27 sections, 1 theorem, 17 equations, 26 figures, 2 tables.

Key Result

Proposition 1

Let the number of dimensions $J=1$, and replace the variance loss term in eqn:loss with $V(Z) = - \hbox{Var}(Z) = \frac{1}{N} (Z - \overline{z})^T (Z - \overline{z})$. Then there exists a value of $c_V$ for which the generalized posterior is maximized at the first kernel PCA component.

Figures (26)

  • Figure 1: Quadrant-weighted concentric circles used to train embeddings. Labels are shown primarily for clarity; the training set labels are not used in training representations (but are used to select hyperparameters using a small validation set).
  • Figure 2: Visualizations of GPSSL-based embedding function, trained on quadrant-weighted concentric circle data (\ref{['fig:embedding_circle_train']}. Top two rows: Per-dimension mean and standard deviation. Bottom left: L2 distance between mean embedding at plot location, and mean embedding at (0, 0). Bottom right: average standard deviation (i.e., average of plots in \ref{['fig:gpssl_circles_std']}).
  • Figure 3: Visualizations of kPCA-based and VICReg-based representation functions, trained on quadrant-weighted concentric circle data (\ref{['fig:embedding_circle_train']}). Top row shows the five dimensions of the kPCA-based representation. Middle row shows the five dimensions of the VICReg-based representation. These are comparable with the GPSSL-based mean representation in \ref{['fig:mean_circ_gpssl']}. Bottom row shows the L2 distance between the representation (left, kPCA; right, VICReg) at the plot location, and the representation at (0, 0). These are comparable with the GPSSL-based \ref{['fig:ref_dist_circ']}.
  • Figure 4: Train and test set for concentric circles downstream classification task.
  • Figure 5: Mean (top) and standard deviation (bottom) of the predictive distributions obtained using Bayesian logistic regression on top of various representations of data (from left to right, GPSSL-mean, GPSSL-full, kPCA, VICReg). Labeled training data is superimposed over each plot (white and blue crosses; color indicates class).
  • ...and 21 more figures

Theorems & Definitions (2)

  • Proposition 1
  • proof