Differentiable Optimization of Similarity Scores Between Models and Brains
Nathan Cloos, Moufan Li, Markus Siegel, Scott L. Brincat, Earl K. Miller, Guangyu Robert Yang, Christopher J. Cueva
TL;DR
The paper tackles the interpretability of representational similarity metrics used to compare models and brains by differentiating through these measures to maximize similarity. It introduces a differentiable optimization framework that yields synthetic datasets Y aligned with neural data X under various metrics (CKA, angular CKA, angular Procrustes, NBS, and regression-based scores) and assesses whether high similarity equates to task-relevant encoding. The study reveals that the meaning of a ‘good’ score is metric- and dataset-dependent, and that high similarity does not guarantee neural-consistent encoding, with CKA biased toward high-variance components. The authors also derive theoretical relationships showing CKA’s quadratic dependence on high-variance PCs versus NBS’s linear dependence, and demonstrate how jointly optimizing multiple metrics defines feasible score ranges, underscoring the need for careful interpretation and providing open-source tooling for standardization. Together, these findings offer a more nuanced framework for using similarity measures in neuroscience and AI, and tools to benchmark and interpret future metrics.
Abstract
How do we know if two systems - biological or artificial - process information in a similar way? Similarity measures such as linear regression, Centered Kernel Alignment (CKA), Normalized Bures Similarity (NBS), and angular Procrustes distance, are often used to quantify this similarity. However, it is currently unclear what drives high similarity scores and even what constitutes a "good" score. Here, we introduce a novel tool to investigate these questions by differentiating through similarity measures to directly maximize the score. Surprisingly, we find that high similarity scores do not guarantee encoding task-relevant information in a manner consistent with neural data; and this is particularly acute for CKA and even some variations of cross-validated and regularized linear regression. We find no consistent threshold for a good similarity score - it depends on both the measure and the dataset. In addition, synthetic datasets optimized to maximize similarity scores initially learn the highest variance principal component of the target dataset, but some methods like angular Procrustes capture lower variance dimensions much earlier than methods like CKA. To shed light on this, we mathematically derive the sensitivity of CKA, angular Procrustes, and NBS to the variance of principal component dimensions, and explain the emphasis CKA places on high variance components. Finally, by jointly optimizing multiple similarity measures, we characterize their allowable ranges and reveal that some similarity measures are more constraining than others. While current measures offer a seemingly straightforward way to quantify the similarity between neural systems, our work underscores the need for careful interpretation. We hope the tools we developed will be used by practitioners to better understand current and future similarity measures.
