Table of Contents
Fetching ...

Spectral Analysis of Representational Similarity with Limited Neurons

Hyunmo Kang, Abdulkadir Canatar, SueYeon Chung

TL;DR

This work develops a rigorous Random Matrix Theory framework to quantify how limited neuron sampling biases representational similarity measures like CKA and CCA. By deriving deterministic equivalents and a backward denoising procedure, it demonstrates that sampling underdelivers population-level similarity due to eigenvector delocalization, particularly for power-law spectra, and provides a practical square-root rule for how many well-localized components can be resolved. The forward analysis explains how sampling reshapes population eigencomponents, while the backward method recovers population overlaps from limited data, validated on synthetic data and primate brain recordings. The results offer actionable guidance for interpreting brain-model comparisons under finite sampling constraints and enable more reliable model evaluation in neuroscience and AI contexts. The work also outlines future directions, including richer spectral priors and extensions to regression and dynamic similarity metrics.

Abstract

Understanding representational similarity between neural recordings and computational models is essential for neuroscience, yet remains challenging to measure reliably due to the constraints on the number of neurons that can be recorded simultaneously. In this work, we apply tools from Random Matrix Theory to investigate how such limitations affect similarity measures, focusing on Centered Kernel Alignment (CKA) and Canonical Correlation Analysis (CCA). We propose an analytical framework for representational similarity analysis that relates measured similarities to the spectral properties of the underlying representations. We demonstrate that neural similarities are systematically underestimated under finite neuron sampling, mainly due to eigenvector delocalization. Moreover, for power-law population spectra, we show that the number of localized eigenvectors scales as the square root of the number of recorded neurons, providing a simple rule of thumb for practitioners. To overcome sampling bias, we introduce a denoising method to infer population-level similarity, enabling accurate analysis even with small neuron samples. Theoretical predictions are validated on synthetic and real datasets, offering practical strategies for interpreting neural data under finite sampling constraints.

Spectral Analysis of Representational Similarity with Limited Neurons

TL;DR

This work develops a rigorous Random Matrix Theory framework to quantify how limited neuron sampling biases representational similarity measures like CKA and CCA. By deriving deterministic equivalents and a backward denoising procedure, it demonstrates that sampling underdelivers population-level similarity due to eigenvector delocalization, particularly for power-law spectra, and provides a practical square-root rule for how many well-localized components can be resolved. The forward analysis explains how sampling reshapes population eigencomponents, while the backward method recovers population overlaps from limited data, validated on synthetic data and primate brain recordings. The results offer actionable guidance for interpreting brain-model comparisons under finite sampling constraints and enable more reliable model evaluation in neuroscience and AI contexts. The work also outlines future directions, including richer spectral priors and extensions to regression and dynamic similarity metrics.

Abstract

Understanding representational similarity between neural recordings and computational models is essential for neuroscience, yet remains challenging to measure reliably due to the constraints on the number of neurons that can be recorded simultaneously. In this work, we apply tools from Random Matrix Theory to investigate how such limitations affect similarity measures, focusing on Centered Kernel Alignment (CKA) and Canonical Correlation Analysis (CCA). We propose an analytical framework for representational similarity analysis that relates measured similarities to the spectral properties of the underlying representations. We demonstrate that neural similarities are systematically underestimated under finite neuron sampling, mainly due to eigenvector delocalization. Moreover, for power-law population spectra, we show that the number of localized eigenvectors scales as the square root of the number of recorded neurons, providing a simple rule of thumb for practitioners. To overcome sampling bias, we introduce a denoising method to infer population-level similarity, enabling accurate analysis even with small neuron samples. Theoretical predictions are validated on synthetic and real datasets, offering practical strategies for interpreting neural data under finite sampling constraints.

Paper Structure

This paper contains 44 sections, 97 equations, 12 figures, 3 algorithms.

Figures (12)

  • Figure 1: a) Illustration of eigenvector delocalization in BBP phase transition. b) Self-overlap $Q_{ii}$ between sample and population eigenvectors for ResNet18 activations. c) CKA between population and sample activations when $N$ neurons are sampled. The gray-shaded region represents the standard deviation of empirical CKA across different random samplings.
  • Figure 2: Comparison of sample vs population measures for CKA and CCA: Error bars represent empirical sample similarity and dotted lines the theoretical predictions. The black dotted line marks the true population similarity which is set close to 0.5 for both measures. Solid lines indicate inferred true similarity from samples. Sample similarity is lower due to eigenvector delocalization, while our method consistently provides a closer estimate of the true value.
  • Figure 3: Left: Participation ratio (P.R.) of self-overlap ($1/\sum_j Q_{ij}^2$), indicating the onset of eigenvector delocalization, for a power-law spectrum $\tilde{\lambda}_i \sim i^{-1.2}$. For fixed $N$, increasing $P$ marginally affects the leading eigenvectors. By contrast, for fixed $P$, increasing $N$ makes more eigenvectors localized. Only sample eigenvectors below the black horizontal line are localized (P.R. $\approx 1$). Heuristically, $\tilde{M}_{ia}$ can be recovered reliably for only indices below this line. Right: Each column shows the 5-trial averaged $\mathbf{M}$, the theoretical prediction of $\mathbf{M}$, the inferred population overlap $\tilde{\mathbf{M}}_{est}$, and the actual population overlap $\tilde{\mathbf{M}}$. With fewer neurons $N$, sample eigenvectors become delocalized, causing large discrepancies. Nevertheless, our inference method successfully recovers the dominant overlaps, which are enough for global similarity measures such as CKA and CCA.
  • Figure 4: Left: Sample-based CCA ranking flips despite Model 2 having a larger population CCA than Model 1. The decrease in Model 2’s CCA is more pronounced due to its stronger reliance on higher-indexed eigenvectors, which become more delocalized with limited neuron sampling. Right: Empirical vs. population cross-overlaps for Model 1 vs. Brain and Model 2 vs. Brain. Here, $P=200$ and $N=30$. All three population eigenvalue spectra follow a power-law with exponent $-1.2$. Although Model 2’s true overlap is higher at the population level, it relies on higher-indexed (smaller eigenvalue) components, which delocalize more severely in the sample.
  • Figure 5: Scatter plots of observed sample similarity vs. inferred population similarity for multiple models compared to V2 cortex, using only $N=20$ neurons (out of a larger set). (Left) CKA results; (Right) CCA results. The dotted line $y=x$ indicates equality. Notice that the inferred population similarity is consistently higher than the naive sample-based measure, demonstrating how limited neuron sampling can lead to underestimation of the true model-brain correspondence.
  • ...and 7 more figures

Theorems & Definitions (1)

  • Remark H.1: Key: $\sqrt{N}$ scaling of eigenvector delocalization