Table of Contents
Fetching ...

Correcting Biased Centered Kernel Alignment Measures in Biological and Artificial Neural Networks

Alex Murphy, Joel Zylberberg, Alona Fyshe

TL;DR

The study addresses the risk of biased Centered Kernel Alignment (CKA) inflating similarity when comparing brain data to artificial neural networks in low-data, high-dimensional settings. It analyzes biased versus debiased CKA on THINGS fMRI/MEG data using ResNet18 and CORnet-S, showing that bias can falsely suggest alignment and that debiased CKA correctly isolates stimulus-driven similarity, at least for fMRI. The work demonstrates that debiased CKA yields zero alignment for random or shuffled neural data and preserves known brain–CNN correspondence only for true stimuli-driven responses, highlighting the importance of debiasing and proper controls. These findings have practical implications for neuroscience alignment studies and the development of brain-like representations in AI, encouraging researchers to adopt debiased CKA and include random/shuffled controls in their analyses.

Abstract

Centred Kernel Alignment (CKA) has recently emerged as a popular metric to compare activations from biological and artificial neural networks (ANNs) in order to quantify the alignment between internal representations derived from stimuli sets (e.g. images, text, video) that are presented to both systems. In this paper we highlight issues that the community should take into account if using CKA as an alignment metric with neural data. Neural data are in the low-data high-dimensionality domain, which is one of the cases where (biased) CKA results in high similarity scores even for pairs of random matrices. Using fMRI and MEG data from the THINGS project, we show that if biased CKA is applied to representations of different sizes in the low-data high-dimensionality domain, they are not directly comparable due to biased CKA's sensitivity to differing feature-sample ratios and not stimuli-driven responses. This situation can arise both when comparing a pre-selected area of interest (e.g. ROI) to multiple ANN layers, as well as when determining to which ANN layer multiple regions of interest (ROIs) / sensor groups of different dimensionality are most similar. We show that biased CKA can be artificially driven to its maximum value when using independent random data of different sample-feature ratios. We further show that shuffling sample-feature pairs of real neural data does not drastically alter biased CKA similarity in comparison to unshuffled data, indicating an undesirable lack of sensitivity to stimuli-driven neural responses. Positive alignment of true stimuli-driven responses is only achieved by using debiased CKA. Lastly, we report findings that suggest biased CKA is sensitive to the inherent structure of neural data, only differing from shuffled data when debiased CKA detects stimuli-driven alignment.

Correcting Biased Centered Kernel Alignment Measures in Biological and Artificial Neural Networks

TL;DR

The study addresses the risk of biased Centered Kernel Alignment (CKA) inflating similarity when comparing brain data to artificial neural networks in low-data, high-dimensional settings. It analyzes biased versus debiased CKA on THINGS fMRI/MEG data using ResNet18 and CORnet-S, showing that bias can falsely suggest alignment and that debiased CKA correctly isolates stimulus-driven similarity, at least for fMRI. The work demonstrates that debiased CKA yields zero alignment for random or shuffled neural data and preserves known brain–CNN correspondence only for true stimuli-driven responses, highlighting the importance of debiasing and proper controls. These findings have practical implications for neuroscience alignment studies and the development of brain-like representations in AI, encouraging researchers to adopt debiased CKA and include random/shuffled controls in their analyses.

Abstract

Centred Kernel Alignment (CKA) has recently emerged as a popular metric to compare activations from biological and artificial neural networks (ANNs) in order to quantify the alignment between internal representations derived from stimuli sets (e.g. images, text, video) that are presented to both systems. In this paper we highlight issues that the community should take into account if using CKA as an alignment metric with neural data. Neural data are in the low-data high-dimensionality domain, which is one of the cases where (biased) CKA results in high similarity scores even for pairs of random matrices. Using fMRI and MEG data from the THINGS project, we show that if biased CKA is applied to representations of different sizes in the low-data high-dimensionality domain, they are not directly comparable due to biased CKA's sensitivity to differing feature-sample ratios and not stimuli-driven responses. This situation can arise both when comparing a pre-selected area of interest (e.g. ROI) to multiple ANN layers, as well as when determining to which ANN layer multiple regions of interest (ROIs) / sensor groups of different dimensionality are most similar. We show that biased CKA can be artificially driven to its maximum value when using independent random data of different sample-feature ratios. We further show that shuffling sample-feature pairs of real neural data does not drastically alter biased CKA similarity in comparison to unshuffled data, indicating an undesirable lack of sensitivity to stimuli-driven neural responses. Positive alignment of true stimuli-driven responses is only achieved by using debiased CKA. Lastly, we report findings that suggest biased CKA is sensitive to the inherent structure of neural data, only differing from shuffled data when debiased CKA detects stimuli-driven alignment.
Paper Structure (20 sections, 4 equations, 5 figures, 1 table)

This paper contains 20 sections, 4 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: For a given reference matrix $A \in \mathbb{R}^{1024 \times 1024}$ sampled from a standard Normal distribution, biased CKA, debiased CKA and RV2 metrics were calculated for matrices of size $1024 \times P$, where $P$ spans a range from [10, 250k] following a geometric progression covering three domains: (a) high-data low-dimensionality, (b) equal data / dimensionality and (c) low-data high-dimensionality (most typical of neuroimaging datasets). CKA without the debiasing step is highly-sensitive to the dimensionality ratio between samples and features.
  • Figure 2: A-B Comparison of fMRI: V1, MEG: occipital sensors averaged over 100-150 ms window, over layers of ResNet18 & CORnet-S, in three different conditions: random data (red), original data (blue) and shuffled data (green). Results across random and shuffled conditions are the average of the three participants and 5 random seeds, while for the original data order, the average is over the three participants. C-D illustrate the layer structure of ResNet18 and CORnet-S, respectively. E-F demonstrate fixing a CNN layer and comparing representations across fMRI voxels (E) or incorporating all sensor values from increasing temporal window sizes in MEG (F).
  • Figure 3: Across early (V1, V2, V3), intermediate (hV4, VO1, TO1) and late (left-FFA, left-PPA, left-EBA) ROIs, biased CKA is contrasted with debiased CKA across the original data order and the mean of five shuffled versions across three fMRI participants over ResNet18. Debiased CKA is the only metric that shows sensitivity to stimuli-driven responses. This replicates the pattern reported in earlier work that lower visual areas exhibit greater correspondence to lower-layer CNN representations and similarly for higher visual areas / higher-layer CNNs yamins_earlier.
  • Figure 4: Across early (V1, V2, V3), intermediate (hV4, VO1, TO1) and late (left-FFA, left-PPA, left-EBA) ROIs, biased CKA is contrasted with debiased CKA across the original data order and the mean of five shuffled versions across three fMRI participants over CORnet-S. Debiased CKA is the only metric that shows sensitivity to stimuli-driven responses and with it we replicate the previously-reported pattern that lower visual areas exhibit greater correspondence to lower-layer CNN representations and similarly for higher visual areas / higher-layer CNNs.
  • Figure 5: When no stimuli-driven alignment is detected (via debiased CKA) between brains and biological neural networks, biased CKA aligns equally as well to shuffled (neural) data as it to does to the original ordering. In ROIs where this occurs, no change is expected as the amount of shuffling is varied from 0% to 100%. Conversely, areas that do appear to contain stimuli-driven alignment should gradually decrease in alignment score as the amount of shuffling is increased. From Figure \ref{['sensitivity_shuffled_cornet']} we selected V1 and left PPA (lPPA) to be representative of the two cases (V1: sensitive to stimuli-driven alignment, lPPA: insensitive to stimuli-driven alignment). We incrementally varied the percentage of shuffled items and measured the alignment in the first three layers of ResNet18 (conv1, maxpool & layer1.0.conv1). We find that the amount of shuffling has no effect on ANN-ROI pairs that originally showed no stimuli-driven alignment. We hypothesize that this is due to biased CKA's sensitivity to a generalized neural response. For early layers of ResNet18 and V1 in Figure \ref{['sensitivity_shuffled_resnet']}, we saw stimuli-driven alignment scores, which are indeed gradually reduced as the percentage of neural data shuffling is increased.