Table of Contents
Fetching ...

Interpretable Kernel Representation Learning at Scale: A Unified Framework Utilizing Nyström Approximation

Maedeh Zarvandi, Michael Timothy, Theresa Wasserer, Debarghya Ghoshdastidar

TL;DR

This work addresses the scalability gap in kernel-based representation learning for self-supervised tasks by introducing KREPES, a Nyström-based framework that enables gradient-based optimization over a low-rank kernel space. It unifies a broad class of unsupervised and SSL losses under a kernel formulation and leverages principled initialization, second-order optimization with generalized Gauss–Newton preconditioning, and efficient landmark selection to scale to large datasets. A key contribution is interpretability: the framework provides representer-landmark–driven explanations, notably sample-specific influence scores and concept activation-vector–based profiles that connect learned representations to meaningful concepts. Empirically, KREPES with empirical NTKs achieves competitive downstream accuracy with significantly fewer parameters than corresponding neural nets, demonstrating both scalability and interpretability advantages for kernel-based SSL at scale.

Abstract

Kernel methods provide a theoretically grounded framework for non-linear and non-parametric learning, with strong analytic foundations and statistical guarantees. Yet, their scalability has long been limited by prohibitive time and memory costs. While progress has been made in scaling kernel regression, no framework exists for scalable kernel-based representation learning, restricting their use in the era of foundation models where representations are learned from massive unlabeled data. We introduce KREPES -- a unified, scalable framework for kernel-based representation learning via Nyström approximation. KREPES accommodates a wide range of unsupervised and self-supervised losses, and experiments on large image and tabular datasets demonstrate its efficiency. Crucially, KREPES enables principled interpretability of the learned representations, an immediate benefit over deep models, which we substantiate through dedicated analysis.

Interpretable Kernel Representation Learning at Scale: A Unified Framework Utilizing Nyström Approximation

TL;DR

This work addresses the scalability gap in kernel-based representation learning for self-supervised tasks by introducing KREPES, a Nyström-based framework that enables gradient-based optimization over a low-rank kernel space. It unifies a broad class of unsupervised and SSL losses under a kernel formulation and leverages principled initialization, second-order optimization with generalized Gauss–Newton preconditioning, and efficient landmark selection to scale to large datasets. A key contribution is interpretability: the framework provides representer-landmark–driven explanations, notably sample-specific influence scores and concept activation-vector–based profiles that connect learned representations to meaningful concepts. Empirically, KREPES with empirical NTKs achieves competitive downstream accuracy with significantly fewer parameters than corresponding neural nets, demonstrating both scalability and interpretability advantages for kernel-based SSL at scale.

Abstract

Kernel methods provide a theoretically grounded framework for non-linear and non-parametric learning, with strong analytic foundations and statistical guarantees. Yet, their scalability has long been limited by prohibitive time and memory costs. While progress has been made in scaling kernel regression, no framework exists for scalable kernel-based representation learning, restricting their use in the era of foundation models where representations are learned from massive unlabeled data. We introduce KREPES -- a unified, scalable framework for kernel-based representation learning via Nyström approximation. KREPES accommodates a wide range of unsupervised and self-supervised losses, and experiments on large image and tabular datasets demonstrate its efficiency. Crucially, KREPES enables principled interpretability of the learned representations, an immediate benefit over deep models, which we substantiate through dedicated analysis.

Paper Structure

This paper contains 26 sections, 7 equations, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: Eigenvalue spectrum comparison of $\tilde{A}^\top \tilde{A}$ under different preconditioners with BT.
  • Figure 2: Preconditioning results in downstream test accuracy for BT and SimCLR across datasets, with preconditioning types (No Preconditioner, General Loss, GGN Hessian Approximation).
  • Figure 2: Sample-specific influential landmarks. First row, $x_{test}$, then top-3 landmarks, ranked by their influence score.
  • Figure 3: Influential landmarks: concept Sea. Left: $x_test$ with predicted label. Middle: top-5 influential landmarks with alignment scores (bottom). Right: aggregated score. Positive scores: concept Sea supports the prediction, negative scores oppose it. Less evident concept in 5th row, yields negative; automobile landmarks contribute negatively (rows 3, 4, 5), whereas airplane landmarks contribute positively when sea (row 1) or blue sky (row 3) is present.