Table of Contents
Fetching ...

Uniform Kernel Prober

Soumya Mukherjee, Bharath K. Sriperumbudur

TL;DR

Uniform Kernel Prober (UKP) introduces a representation-distance measure d_{λ,K}^{UKP} that bounds uniform prediction differences across kernel ridge regression tasks for any pair of representations φ and ψ. By leveraging pullback RKHSs, covariance operators, and a trace-based reformulation, UKP provides a mathematically sound pseudometric that encodes inductive biases through kernel choice and is estimable from unlabeled data with a parametric O(1/√n) rate. Empirical results on MNIST and ImageNet demonstrate UKP’s ability to predict generalization performance and to cluster models by architecture more effectively than some existing measures like GULP and CKA. The approach offers a practical, interpretable toolkit for comparing representations across diverse models, with potential for model selection, hyperparameter tuning, and scalability enhancements in large settings.

Abstract

The ability to identify useful features or representations of the input data based on training data that achieves low prediction error on test data across multiple prediction tasks is considered the key to multitask learning success. In practice, however, one faces the issue of the choice of prediction tasks and the availability of test data from the chosen tasks while comparing the relative performance of different features. In this work, we develop a class of pseudometrics called Uniform Kernel Prober (UKP) for comparing features or representations learned by different statistical models such as neural networks when the downstream prediction tasks involve kernel ridge regression. The proposed pseudometric, UKP, between any two representations, provides a uniform measure of prediction error on test data corresponding to a general class of kernel ridge regression tasks for a given choice of a kernel without access to test data. Additionally, desired invariances in representations can be successfully captured by UKP only through the choice of the kernel function and the pseudometric can be efficiently estimated from $n$ input data samples with $O(\frac{1}{\sqrt{n}})$ estimation error. We also experimentally demonstrate the ability of UKP to discriminate between different types of features or representations based on their generalization performance on downstream kernel ridge regression tasks.

Uniform Kernel Prober

TL;DR

Uniform Kernel Prober (UKP) introduces a representation-distance measure d_{λ,K}^{UKP} that bounds uniform prediction differences across kernel ridge regression tasks for any pair of representations φ and ψ. By leveraging pullback RKHSs, covariance operators, and a trace-based reformulation, UKP provides a mathematically sound pseudometric that encodes inductive biases through kernel choice and is estimable from unlabeled data with a parametric O(1/√n) rate. Empirical results on MNIST and ImageNet demonstrate UKP’s ability to predict generalization performance and to cluster models by architecture more effectively than some existing measures like GULP and CKA. The approach offers a practical, interpretable toolkit for comparing representations across diverse models, with potential for model selection, hyperparameter tuning, and scalability enhancements in large settings.

Abstract

The ability to identify useful features or representations of the input data based on training data that achieves low prediction error on test data across multiple prediction tasks is considered the key to multitask learning success. In practice, however, one faces the issue of the choice of prediction tasks and the availability of test data from the chosen tasks while comparing the relative performance of different features. In this work, we develop a class of pseudometrics called Uniform Kernel Prober (UKP) for comparing features or representations learned by different statistical models such as neural networks when the downstream prediction tasks involve kernel ridge regression. The proposed pseudometric, UKP, between any two representations, provides a uniform measure of prediction error on test data corresponding to a general class of kernel ridge regression tasks for a given choice of a kernel without access to test data. Additionally, desired invariances in representations can be successfully captured by UKP only through the choice of the kernel function and the pseudometric can be efficiently estimated from input data samples with estimation error. We also experimentally demonstrate the ability of UKP to discriminate between different types of features or representations based on their generalization performance on downstream kernel ridge regression tasks.

Paper Structure

This paper contains 29 sections, 9 theorems, 83 equations, 10 figures.

Key Result

Lemma 1

For any $\lambda>0$, the squared UKP distance $d_{\lambda,K}^{\emph{UKP }}(\phi,\psi)$ between representations $\phi(X)$ and $\psi(X)$ can be expressed as where $X$ and $X^{\prime}$ are i.i.d observations drawn from $P_{X}$.

Figures (10)

  • Figure 1: Generalization of kernel ridge regression-based predictors is strongly positively correlated with UKP distance values. We report the average correlation across 10 random synthetic kernel ridge regression tasks. Error bars are negligibly small and hence not visible.
  • Figure 2: Clustering based on UKP distance is sensitive to differences in architectures of neural network models.
  • Figure 3: Heatmaps representing UKP distance between pairs of fully-connected ReLU networks of different depths and widths. We choose the kernel for the UKP distance to be the Gaussian RBF kernel with bandwidth $\sigma \in \left\{1,10^{-1},10^{-2}\right\}$ along with the regularization parameter $\lambda \in \left\{1,10\right\}$. Along the rows and columns of each of the heatmaps, the ReLU networks are first arranged in order of increasing depth, and then in order of increasing width inside each specific depth level. Darker colors indicate smaller value of UKP distance according to the scale attached to each heatmap.
  • Figure 4: Dendrograms corresponding to agglomerative hierarchical clustering of representations of 50 ReLU networks based on UKP distance
  • Figure 5: Spearman's $\rho$ rank correlation coefficient between generalization of kernel ridge regression-based predictors with various distance measures between representations. We report the average correlation across 10 random synthetic kernel ridge regression tasks. Results are similar for 30 trials. Error bars are negligibly small and hence not visible.
  • ...and 5 more figures

Theorems & Definitions (16)

  • Definition 1
  • Lemma 1
  • Proposition 1
  • Theorem 1
  • Lemma 2
  • Corollary 1
  • Corollary 2
  • Proposition 2
  • Proposition 3
  • Theorem 2
  • ...and 6 more