Spectral Analysis of Representational Similarity with Limited Neurons
Hyunmo Kang, Abdulkadir Canatar, SueYeon Chung
TL;DR
This work develops a rigorous Random Matrix Theory framework to quantify how limited neuron sampling biases representational similarity measures like CKA and CCA. By deriving deterministic equivalents and a backward denoising procedure, it demonstrates that sampling underdelivers population-level similarity due to eigenvector delocalization, particularly for power-law spectra, and provides a practical square-root rule for how many well-localized components can be resolved. The forward analysis explains how sampling reshapes population eigencomponents, while the backward method recovers population overlaps from limited data, validated on synthetic data and primate brain recordings. The results offer actionable guidance for interpreting brain-model comparisons under finite sampling constraints and enable more reliable model evaluation in neuroscience and AI contexts. The work also outlines future directions, including richer spectral priors and extensions to regression and dynamic similarity metrics.
Abstract
Understanding representational similarity between neural recordings and computational models is essential for neuroscience, yet remains challenging to measure reliably due to the constraints on the number of neurons that can be recorded simultaneously. In this work, we apply tools from Random Matrix Theory to investigate how such limitations affect similarity measures, focusing on Centered Kernel Alignment (CKA) and Canonical Correlation Analysis (CCA). We propose an analytical framework for representational similarity analysis that relates measured similarities to the spectral properties of the underlying representations. We demonstrate that neural similarities are systematically underestimated under finite neuron sampling, mainly due to eigenvector delocalization. Moreover, for power-law population spectra, we show that the number of localized eigenvectors scales as the square root of the number of recorded neurons, providing a simple rule of thumb for practitioners. To overcome sampling bias, we introduce a denoising method to infer population-level similarity, enabling accurate analysis even with small neuron samples. Theoretical predictions are validated on synthetic and real datasets, offering practical strategies for interpreting neural data under finite sampling constraints.
