Table of Contents
Fetching ...

Representational Similarity via Interpretable Visual Concepts

Neehar Kondapaneni, Oisin Mac Aodha, Pietro Perona

TL;DR

This work tackles the challenge of not only measuring how similarly two networks represent information but also exposing what visual concepts underlie their similarities and differences. It introduces Representational Similarity via Interpretable Visual Concepts (RSVC), which decomposes layer activations into concept dictionaries and coefficients, then assesses cross-model concept alignment via regression-based mappings and correlation metrics. The approach enables interpretation of both shared and unique concepts, supports a replacement test to connect representational changes to model decisions, and provides visualizations for low-similarity concepts. Extensive experiments across CNNs and ViTs, with varying training protocols and even an LLVM-assisted qualitative analysis, demonstrate RSVC's generality, its ability to reveal meaningful differences, and its potential to inform debugging and fairness considerations in vision models.

Abstract

How do two deep neural networks differ in how they arrive at a decision? Measuring the similarity of deep networks has been a long-standing open question. Most existing methods provide a single number to measure the similarity of two networks at a given layer, but give no insight into what makes them similar or dissimilar. We introduce an interpretable representational similarity method (RSVC) to compare two networks. We use RSVC to discover shared and unique visual concepts between two models. We show that some aspects of model differences can be attributed to unique concepts discovered by one model that are not well represented in the other. Finally, we conduct extensive evaluation across different vision model architectures and training protocols to demonstrate its effectiveness.

Representational Similarity via Interpretable Visual Concepts

TL;DR

This work tackles the challenge of not only measuring how similarly two networks represent information but also exposing what visual concepts underlie their similarities and differences. It introduces Representational Similarity via Interpretable Visual Concepts (RSVC), which decomposes layer activations into concept dictionaries and coefficients, then assesses cross-model concept alignment via regression-based mappings and correlation metrics. The approach enables interpretation of both shared and unique concepts, supports a replacement test to connect representational changes to model decisions, and provides visualizations for low-similarity concepts. Extensive experiments across CNNs and ViTs, with varying training protocols and even an LLVM-assisted qualitative analysis, demonstrate RSVC's generality, its ability to reveal meaningful differences, and its potential to inform debugging and fairness considerations in vision models.

Abstract

How do two deep neural networks differ in how they arrive at a decision? Measuring the similarity of deep networks has been a long-standing open question. Most existing methods provide a single number to measure the similarity of two networks at a given layer, but give no insight into what makes them similar or dissimilar. We introduce an interpretable representational similarity method (RSVC) to compare two networks. We use RSVC to discover shared and unique visual concepts between two models. We show that some aspects of model differences can be attributed to unique concepts discovered by one model that are not well represented in the other. Finally, we conduct extensive evaluation across different vision model architectures and training protocols to demonstrate its effectiveness.

Paper Structure

This paper contains 35 sections, 7 equations, 19 figures, 4 tables.

Figures (19)

  • Figure 1: Representational Similarity via interpretable Visual Concepts (RSVC). (Concept Extraction): First, activations for a set of image patches, $\mathcal{I}^c$, are computed for each model ($M_1$ and $M_2$). Second, the activation matrix for $M_1$ is factorized into the concept coefficient matrix$\mathbf{U}_1$ and the concept basis$\mathbf{W}_1$, i.e., $\mathbf{A}_1 \approx \mathbf{U}_1 \mathbf{W}_1$. Each entry in a column vector of the coefficient matrix $\mathbf{U}_1$ represents the strength of a concept in an image. Concepts are visualized by the image patches that correspond to the top $n$ coefficients. Here, we highlight only two concepts, $u^a_1$ and $u^b_1$. The top four images for these concepts indicate that $u^a_1$ represents bluejay tail and $u^b_1$ represents sky background. (Concept Regression): To measure concept similarity, we learn a weight matrix $\mathbf{W}^*_{2\rightarrow 1}$ to map $\mathbf{A}_2$ to the concept coefficient matrix $\mathbf{U}_2$. We denote the predicted coefficient matrix as $\mathbf{U}_{2 \rightarrow 1}$. (Concept Similarity): Finally, we compute the correlation between columns of $\mathbf{U}_{2 \rightarrow 1}$ and $\mathbf{U}_{1}$. If $\mathbf{A}_2$ contains a concept in $\mathbf{U}_1$, then the predicted coefficient vector should be highly correlated to the real coefficient vector. In this example, we see that the bluejay tail concept is poorly represented in $M_2$, but both models share the sky background concept.
  • Figure 2: Adding and Discovering a Toy Concept. Here we train two ResNet-18 models, $M_{ps}$ and $M_{nc}$. $M_{ps}$ is trained to associate a pink square (i.e., Concept 1) with the Common Eider class, while $M_{nc}$ is trained to be invariant to the pink square concept. We find that the similarity score from $M_{nc} \rightarrow M_{ps}$ for Concept 1 is $\sim0.0$, indicating that $M_{nc}$ is unable to predict Concept 1 from $M_{ps}$. To understand various aspects of the differences between the two models, RSVC inspects three distinct regions of the predicted vs. real coefficient scatter plot (\ref{['sec:interp_low_sim']}). (Green): RSVC visualizes images corresponding to the top-10 $M_{ps}$ target concept coefficients. This allows the user to understand what the target concept is encoding. This concept clearly reacts strongly to the pink square visual feature. (Blue): RSVC visualizes the image patches with the largest $M_{nc}$ under-predicted coefficients. $M_{nc}$ under-reacts to the pink square when compared to $M_{ps}$. (Orange): RSVC visualizes the image patches corresponding to the top-10 $M_{nc}$ over-predicted coefficients. The over-predicted patches show that $M_{nc}$ cannot distinguish between background and the pink square.
  • Figure 3: Concept Similarity vs. Concept Importance. We compare four pairs of models using CMCS: (A) RN18 vs. RN50, (B) RN50 vs. ViT-S, (C) ViT-S vs. ViT-L, and (D) DINO vs. MAE. The y-axis represents the concept importance (CI) measured using concept integrated gradients. Warmer colors represent the density of points in a region. We highlight several regions in the plots: (1) low similarity and low importance concepts that are unique to a model but contribute little to its decisions, (2) high importance and high similarity concepts that are shared across both models and also contribute greatly to decision making, (3) low similarity, high importance concepts that only one model has discovered, but are very important to that model's decisions.
  • Figure 4: Replacement Test. We determine whether poorly predicted coefficients for concepts actually impact model behavior (\ref{['sec:rep_test_results']}). We use color to represent the concept importance (warmer is higher importance). When ignoring low importance concepts, we observe expected trends, i.e., decreases in similarity ($\Delta \text{Pearson}$) result in increases in the $l_2$-distance, increases in KL-divergence on the classifier logits, and decreases in model accuracy. The effect also seems to be scaled by importance, for example, changes to low importance concepts (black) has no impact on $\Delta$KL.
  • Figure 5: Interpreting Low Similarity Concepts. In this example, we find a RN50 concept for the barbell class that the ViT-S is not able to predict. (Green): The RN50 concept reacts to images of hands lifting barbells. Additionally, many images contain vertical supports for a squat rack. We train a regression model on the ViT-S activations to predict the RN50 concept coefficients. (Blue): The ViT-S regression model under-reacts to images containing hands, people, and squat racks. (Orange): It over-reacts to images that have a greater focus on weight plates. These results suggest that the the specific concept of hands lifting barbells is not represented in the ViT-S. In \ref{['sec:llvm_analysis_main']} we use an LLVM to analyze the image collages (IC1 and IC2) and find that it detects similar differences in the visualizations.
  • ...and 14 more figures