Table of Contents
Fetching ...

Measuring the Measures: Discriminative Capacity of Representational Similarity Metrics Across Model Families

Jialin Wu, Shreya Saha, Yiqing Bo, Meenakshi Khosla

TL;DR

The paper addresses whether common representational similarity metrics can reliably distinguish between model families across architectures and training regimes. It introduces a quantitative framework that uses d-prime, silhouette, and ROC-AUC to assess discriminability across 35 vision models spanning CNNs, Vision Transformers, Swin Transformers, and ConvNeXt, under supervised and self-supervised training. The key finding is that metrics with stronger alignment constraints (e.g., RSA and Soft Matching) achieve higher separability than looser mappings, challenging the notion that looser metrics better capture differences. This framework provides practical guidance for selecting metrics in large-scale model-to-brain comparisons and offers a principled benchmark for evaluating similarity measures.

Abstract

Representational similarity metrics are fundamental tools in neuroscience and AI, yet we lack systematic comparisons of their discriminative power across model families. We introduce a quantitative framework to evaluate representational similarity measures based on their ability to separate model families-across architectures (CNNs, Vision Transformers, Swin Transformers, ConvNeXt) and training regimes (supervised vs. self-supervised). Using three complementary separability measures-dprime from signal detection theory, silhouette coefficients and ROC-AUC, we systematically assess the discriminative capacity of commonly used metrics including RSA, linear predictivity, Procrustes, and soft matching. We show that separability systematically increases as metrics impose more stringent alignment constraints. Among mapping-based approaches, soft-matching achieves the highest separability, followed by Procrustes alignment and linear predictivity. Non-fitting methods such as RSA also yield strong separability across families. These results provide the first systematic comparison of similarity metrics through a separability lens, clarifying their relative sensitivity and guiding metric choice for large-scale model and brain comparisons.

Measuring the Measures: Discriminative Capacity of Representational Similarity Metrics Across Model Families

TL;DR

The paper addresses whether common representational similarity metrics can reliably distinguish between model families across architectures and training regimes. It introduces a quantitative framework that uses d-prime, silhouette, and ROC-AUC to assess discriminability across 35 vision models spanning CNNs, Vision Transformers, Swin Transformers, and ConvNeXt, under supervised and self-supervised training. The key finding is that metrics with stronger alignment constraints (e.g., RSA and Soft Matching) achieve higher separability than looser mappings, challenging the notion that looser metrics better capture differences. This framework provides practical guidance for selecting metrics in large-scale model-to-brain comparisons and offers a principled benchmark for evaluating similarity measures.

Abstract

Representational similarity metrics are fundamental tools in neuroscience and AI, yet we lack systematic comparisons of their discriminative power across model families. We introduce a quantitative framework to evaluate representational similarity measures based on their ability to separate model families-across architectures (CNNs, Vision Transformers, Swin Transformers, ConvNeXt) and training regimes (supervised vs. self-supervised). Using three complementary separability measures-dprime from signal detection theory, silhouette coefficients and ROC-AUC, we systematically assess the discriminative capacity of commonly used metrics including RSA, linear predictivity, Procrustes, and soft matching. We show that separability systematically increases as metrics impose more stringent alignment constraints. Among mapping-based approaches, soft-matching achieves the highest separability, followed by Procrustes alignment and linear predictivity. Non-fitting methods such as RSA also yield strong separability across families. These results provide the first systematic comparison of similarity metrics through a separability lens, clarifying their relative sensitivity and guiding metric choice for large-scale model and brain comparisons.

Paper Structure

This paper contains 25 sections, 3 figures.

Figures (3)

  • Figure 1: (Top) Heatmaps showing d-prime separability scores for pairwise comparisons between six model families: CNN (sup.), CNN (unsup.), ConvNeXt (sup.), Swin (sup.), Tran (sup.), and Tran (unsup.). Each panel corresponds to a different representational similarity metric: RSA, Soft Matching, Procrustes, and Linear Predictivity. Higher d-prime values (darker colors) indicate better separation between families, with d' > 2 conventionally considered strong discrimination. Each cell represents the averaged bidirectional d-prime between two model families. Diagonal entries are undefined (comparing a family with itself) and shown in white.
  • Figure 2: Same as Figure 1, but using silhouette scores instead of d-prime as the separability measure. Silhouette scores range from -1 to 1, where positive values (darker colors) indicate that models are well-clustered within their families and separated from other families, values near 0 suggest models lie at family boundaries, and negative values indicate misclassification.
  • Figure 3: Global ROC curves comparing the discriminability of all representational similarity metrics. Each curve reflects the trade-off between true- and false-positive rates when distinguishing within-family from between-family pairs.