Bayesian Comparisons Between Representations
Heiko H. Schütt
TL;DR
The paper introduces a Bayesian framework for comparing neural representations by analyzing the predictive distributions of linear readouts from intermediate representations. It derives analytically tractable prior predictive distributions under Gaussian priors and noise, with distances between predictive distributions, specifically the Jensen-Shannon Distance and Total Variation Distance, serving as pseudo-metrics on representations that connect to kernel-based measures via the linear kernel XX^T. The authors demonstrate the approach on ImageNet-1k and a neural-physiology subset, showing that Bayesian distances capture meaningful representational structure, exhibit informative uncertainty, and correlate with, yet remain distinct from, existing metrics like RSA and CK A. They also provide stability analyses and neural-data evaluations, illustrating practical advantages for model comparison and stimulus-design considerations in neuroscience-inspired machine learning. Overall, the work offers a principled, uncertainty-aware toolkit for representational comparison that complements and extends current kernel- and probe-based approaches.
Abstract
Which neural networks are similar is a fundamental question for both machine learning and neuroscience. Here, it is proposed to base comparisons on the predictive distributions of linear readouts from intermediate representations. In Bayesian statistics, the prior predictive distribution is a full description of the inductive bias and generalization of a model, making it a great basis for comparisons. This distribution directly gives the evidence a dataset would provide in favor of the model. If we want to compare multiple models to each other, we can use a metric for probability distributions like the Jensen-Shannon distance or the total variation distance. As these are metrics, this induces pseudo-metrics for representations, which measure how well two representations could be distinguished based on a linear read out. For a linear readout with a Gaussian prior on the read-out weights and Gaussian noise, we can analytically compute the (prior and posterior) predictive distributions without approximations. These distributions depend only on the linear kernel matrix of the representations in the model. Thus, the Bayesian metrics connect to both linear read-out based comparisons and kernel based metrics like centered kernel alignment and representational similarity analysis. The new methods are demonstrated with deep neural networks trained on ImageNet-1k comparing them to each other and a small subset of the Natural Scenes Dataset. The Bayesian comparisons are correlated to but distinct from existing metrics. Evaluations vary slightly less across random image samples and yield informative results with full uncertainty information. Thus the proposed Bayesian metrics nicely extend our toolkit for comparing representations.
