Table of Contents
Fetching ...

Estimating Dimensionality of Neural Representations from Finite Samples

Chanwoo Chun, Abdulkadir Canatar, SueYeon Chung, Daniel Lee

TL;DR

This work shows that, in particular, the participation ratio of eigenvalues, a popular measure of global dimensionality, is highly biased with small sample sizes, and proposes a bias-corrected estimator that is more accurate with finite samples and with noise.

Abstract

The global dimensionality of a neural representation manifold provides rich insight into the computational process underlying both artificial and biological neural networks. However, all existing measures of global dimensionality are sensitive to the number of samples, i.e., the number of rows and columns of the sample matrix. We show that, in particular, the participation ratio of eigenvalues, a popular measure of global dimensionality, is highly biased with small sample sizes, and propose a bias-corrected estimator that is more accurate with finite samples and with noise. On synthetic data examples, we demonstrate that our estimator can recover the true known dimensionality. We apply our estimator to neural brain recordings, including calcium imaging, electrophysiological recordings, and fMRI data, and to the neural activations in a large language model and show our estimator is invariant to the sample size. Finally, our estimators can additionally be used to measure the local dimensionalities of curved neural manifolds by weighting the finite samples appropriately.

Estimating Dimensionality of Neural Representations from Finite Samples

TL;DR

This work shows that, in particular, the participation ratio of eigenvalues, a popular measure of global dimensionality, is highly biased with small sample sizes, and proposes a bias-corrected estimator that is more accurate with finite samples and with noise.

Abstract

The global dimensionality of a neural representation manifold provides rich insight into the computational process underlying both artificial and biological neural networks. However, all existing measures of global dimensionality are sensitive to the number of samples, i.e., the number of rows and columns of the sample matrix. We show that, in particular, the participation ratio of eigenvalues, a popular measure of global dimensionality, is highly biased with small sample sizes, and propose a bias-corrected estimator that is more accurate with finite samples and with noise. On synthetic data examples, we demonstrate that our estimator can recover the true known dimensionality. We apply our estimator to neural brain recordings, including calcium imaging, electrophysiological recordings, and fMRI data, and to the neural activations in a large language model and show our estimator is invariant to the sample size. Finally, our estimators can additionally be used to measure the local dimensionalities of curved neural manifolds by weighting the finite samples appropriately.

Paper Structure

This paper contains 30 sections, 132 equations, 5 figures.

Figures (5)

  • Figure 1: Different dimensionality estimates of the linear model with $d=50$ and noise variance $\sigma^2_\epsilon=0.2$.
  • Figure 2: Dimensionality estimates on four different neural recording datasets for varying number of stimuli $P$, and neural activation units $Q$, by subsampling from the full dataset. Top left: Mouse V1 stringer2019high; Top right: Macaque IT majaj2015simple; Bottom left: Macaque V4 papale2025extensive; Bottom right: Human IT Hebart2023.
  • Figure 3: Estimating the task dimensionality of LLM features for different languages. a) We calculate the dimensionality of the last layer for each language separately and report its average as a function of the input sampling ratio. In this example, all layers have $Q=4096$ dimensional representations, and each language has a total of $P=483$ sentences. The error bars represent the standard deviation for $50$ random draws. b) The dimensionality profile across layers when the sampling ratio is $0.1$ ($P=48$).
  • Figure 4: a. Estimating the local dimensionality of the random Fourier feature model using TwoNN, $\gamma_{\text{naive}}^{\text{local}}\left(r\right)$, and $\gamma_{\text{both}}^{\text{local}}\left(r\right)$, while varying the radius of the local ball for the latter two estimators. Signal-to-noise ratio is approximately $3.33$ ($\sigma_\epsilon=0.3$). b. Estimating the local dimensionality of the macaque V1 LFP measured with electrode arrays papale2025extensive.
  • Figure S1: Bias due to nonlinearity in the definition of dimensionality.