Table of Contents
Fetching ...

Kernel VICReg for Self-Supervised Learning in Reproducing Kernel Hilbert Space

M. Hadi Sepanj, Benyamin Ghojogh, Saed Moradi, Paul Fieguth

TL;DR

Kernel VICReg is proposed, a novel self-supervised learning framework that pulls the VICReg objective into a Reproducing Kernel Hilbert Space (RKHS) that operates on double-centered kernel matrices and Hilbert–Schmidt norms, enabling nonlinear feature learning without explicit mappings.

Abstract

Self-supervised learning (SSL) has emerged as a powerful paradigm for representation learning by optimizing geometric objectives, such as invariance to augmentations, variance preservation, and feature decorrelation, without requiring labels. However, most existing methods operate in Euclidean space, limiting their ability to capture nonlinear dependencies and geometric structures. In this work, we propose Kernel VICReg, a novel self-supervised learning framework that pulls the VICReg objective into a Reproducing Kernel Hilbert Space (RKHS). By kernelizing each term of the loss, variance, invariance, and covariance, we obtain a general formulation that operates on double-centered kernel matrices and Hilbert--Schmidt norms, enabling nonlinear feature learning without explicit mappings. We demonstrate that Kernel VICReg mitigates the risk of representational collapse under challenging conditions and improves performance on datasets exhibiting nonlinear structure or limited sample regimes. Empirical evaluations across MNIST, CIFAR-10, STL-10, TinyImageNet, and ImageNet100 show consistent gains over Euclidean VICReg, with particularly strong improvements on datasets where nonlinear structures are prominent. UMAP visualizations are provided only as a qualitative illustration of embedding geometry and are not used as a calibration or statistical validation. Our results suggest that kernelizing SSL objectives is a promising direction for bridging classical kernel methods with modern representation learning.

Kernel VICReg for Self-Supervised Learning in Reproducing Kernel Hilbert Space

TL;DR

Kernel VICReg is proposed, a novel self-supervised learning framework that pulls the VICReg objective into a Reproducing Kernel Hilbert Space (RKHS) that operates on double-centered kernel matrices and Hilbert–Schmidt norms, enabling nonlinear feature learning without explicit mappings.

Abstract

Self-supervised learning (SSL) has emerged as a powerful paradigm for representation learning by optimizing geometric objectives, such as invariance to augmentations, variance preservation, and feature decorrelation, without requiring labels. However, most existing methods operate in Euclidean space, limiting their ability to capture nonlinear dependencies and geometric structures. In this work, we propose Kernel VICReg, a novel self-supervised learning framework that pulls the VICReg objective into a Reproducing Kernel Hilbert Space (RKHS). By kernelizing each term of the loss, variance, invariance, and covariance, we obtain a general formulation that operates on double-centered kernel matrices and Hilbert--Schmidt norms, enabling nonlinear feature learning without explicit mappings. We demonstrate that Kernel VICReg mitigates the risk of representational collapse under challenging conditions and improves performance on datasets exhibiting nonlinear structure or limited sample regimes. Empirical evaluations across MNIST, CIFAR-10, STL-10, TinyImageNet, and ImageNet100 show consistent gains over Euclidean VICReg, with particularly strong improvements on datasets where nonlinear structures are prominent. UMAP visualizations are provided only as a qualitative illustration of embedding geometry and are not used as a calibration or statistical validation. Our results suggest that kernelizing SSL objectives is a promising direction for bridging classical kernel methods with modern representation learning.

Paper Structure

This paper contains 34 sections, 4 theorems, 57 equations, 5 figures, 4 tables.

Key Result

Proposition 1

Let $\widehat{\bm{K}}(\bm{x})$ denote the double-centered kernel matrix of a batch. If the kernelized variance regularization enforces: where $\{\lambda_i\}_{i=1}^b$ are eigenvalues of $\widehat{\bm{K}}(\bm{x})$, then the covariance operator $\bm{C}_{\phi}(\bm{x})$ in RKHS is strictly positive definite on the span of the batch, and representational collapse (i.e., rank-one embedding) is prevented

Figures (5)

  • Figure 1: UMAP projections (axes: UMAP-1 and UMAP-2) of MNIST embeddings from VICReg (top left), Kernel VICReg with linear kernel (top right), and Kernel VICReg with Laplacian kernel (bottom). Colors denote digit classes (label indices 0--9). The Laplacian kernel yields rounder, more isometric clusters, indicating improved class separability.
  • Figure 2: Robustness of polynomial-kernel Kernel-VICReg across hyperparameter settings and distribution shifts on MNIST. Each grouped bar corresponds to one $(d,c_0)$ configuration, where $d$ is polynomial degree and $c_0$ is the additive constant in $k(x,y)=(\gamma x^\top y + c_0)^d$. Bars report linear-probe accuracy on clean data and under rotation and contrast shift. The spread across groups indicates strong hyperparameter sensitivity, with certain settings preserving clean/shifted performance.
  • Figure 3: Robustness sensitivity of RBF-kernel Kernel-VICReg to the bandwidth parameter $\gamma$ under clean and shifted MNIST evaluation. Curves report linear-probe accuracy on clean data and under rotation and contrast shift. Performance is non-monotonic with respect to $\gamma$: moderate values yield better overall robustness, while overly large $\gamma$ leads to pronounced degradation, demonstrating the need for careful kernel hyperparameter tuning.
  • Figure 4: Kernel latency scaling across embedding dimensions. Each curve corresponds to a kernel type (Linear, RBF, Polynomial) at a fixed batch size, and line style differentiates batch size while color differentiates kernel family. Latency increases with both embedding dimension and batch size, with RBF consistently incurring the highest compute cost and linear kernel remaining the fastest in most regimes. This figure quantifies the computational overhead of kernel choice under high-dimensional settings.
  • Figure 5: Kernel memory scaling across embedding dimensions. Color encodes kernel type (Linear, RBF, Polynomial), and line style encodes batch size. Memory usage increases with embedding dimension and batch size for all kernels, while RBF shows the highest memory footprint at larger dimensions, indicating its greater resource demand in high-dimensional regimes.

Theorems & Definitions (9)

  • Proposition 1: Non-Collapse in RKHS
  • proof
  • Remark 1
  • Theorem 1: Nonlinear Variance Capture in RKHS
  • proof
  • Remark 2
  • Theorem 2: Spectral Stability of Centered Kernel Matrices
  • proof
  • Corollary 1