Measuring the Measures: Discriminative Capacity of Representational Similarity Metrics Across Model Families
Jialin Wu, Shreya Saha, Yiqing Bo, Meenakshi Khosla
TL;DR
The paper addresses whether common representational similarity metrics can reliably distinguish between model families across architectures and training regimes. It introduces a quantitative framework that uses d-prime, silhouette, and ROC-AUC to assess discriminability across 35 vision models spanning CNNs, Vision Transformers, Swin Transformers, and ConvNeXt, under supervised and self-supervised training. The key finding is that metrics with stronger alignment constraints (e.g., RSA and Soft Matching) achieve higher separability than looser mappings, challenging the notion that looser metrics better capture differences. This framework provides practical guidance for selecting metrics in large-scale model-to-brain comparisons and offers a principled benchmark for evaluating similarity measures.
Abstract
Representational similarity metrics are fundamental tools in neuroscience and AI, yet we lack systematic comparisons of their discriminative power across model families. We introduce a quantitative framework to evaluate representational similarity measures based on their ability to separate model families-across architectures (CNNs, Vision Transformers, Swin Transformers, ConvNeXt) and training regimes (supervised vs. self-supervised). Using three complementary separability measures-dprime from signal detection theory, silhouette coefficients and ROC-AUC, we systematically assess the discriminative capacity of commonly used metrics including RSA, linear predictivity, Procrustes, and soft matching. We show that separability systematically increases as metrics impose more stringent alignment constraints. Among mapping-based approaches, soft-matching achieves the highest separability, followed by Procrustes alignment and linear predictivity. Non-fitting methods such as RSA also yield strong separability across families. These results provide the first systematic comparison of similarity metrics through a separability lens, clarifying their relative sensitivity and guiding metric choice for large-scale model and brain comparisons.
