Table of Contents
Fetching ...

When Does Closeness in Distribution Imply Representational Similarity? An Identifiability Perspective

Beatrix M. G. Nielsen, Emanuele Marconato, Andrea Dittadi, Luigi Gresele

TL;DR

This work advances a principled theory for when closeness of model output distributions implies similarity of internal representations by leveraging identifiability concepts. It shows that traditional distributional distances like KL divergence can be misleading: two models can have near-identical likelihoods yet highly dissimilar representations. To remedy this, it introduces a log-likelihood variance distance $d^{\lambda}_{\mathrm{LLV}}$ and a representation distance $d_{\mathbf{f},\mathbf{g}}$ based on PLS-SVD, plus a bound that links the two under suitable diversity and invertibility conditions. Empirically, CIFAR-10 experiments reveal dissimilar representations despite similar performance, while synthetic experiments demonstrate that wider networks yield closer distributions and more similar representations, supporting the claim that distributional closeness does not always guarantee representational similarity. Overall, the paper clarifies the nuanced relationship between distributional proximity and representation similarity and offers a concrete framework for analyzing it.

Abstract

When and why representations learned by different deep neural networks are similar is an active research topic. We choose to address these questions from the perspective of identifiability theory, which suggests that a measure of representational similarity should be invariant to transformations that leave the model distribution unchanged. Focusing on a model family which includes several popular pre-training approaches, e.g., autoregressive language models, we explore when models which generate distributions that are close have similar representations. We prove that a small Kullback--Leibler divergence between the model distributions does not guarantee that the corresponding representations are similar. This has the important corollary that models with near-maximum data likelihood can still learn dissimilar representations -- a phenomenon mirrored in our experiments with models trained on CIFAR-10. We then define a distributional distance for which closeness implies representational similarity, and in synthetic experiments, we find that wider networks learn distributions which are closer with respect to our distance and have more similar representations. Our results thus clarify the link between closeness in distribution and representational similarity.

When Does Closeness in Distribution Imply Representational Similarity? An Identifiability Perspective

TL;DR

This work advances a principled theory for when closeness of model output distributions implies similarity of internal representations by leveraging identifiability concepts. It shows that traditional distributional distances like KL divergence can be misleading: two models can have near-identical likelihoods yet highly dissimilar representations. To remedy this, it introduces a log-likelihood variance distance and a representation distance based on PLS-SVD, plus a bound that links the two under suitable diversity and invertibility conditions. Empirically, CIFAR-10 experiments reveal dissimilar representations despite similar performance, while synthetic experiments demonstrate that wider networks yield closer distributions and more similar representations, supporting the claim that distributional closeness does not always guarantee representational similarity. Overall, the paper clarifies the nuanced relationship between distributional proximity and representation similarity and offers a concrete framework for analyzing it.

Abstract

When and why representations learned by different deep neural networks are similar is an active research topic. We choose to address these questions from the perspective of identifiability theory, which suggests that a measure of representational similarity should be invariant to transformations that leave the model distribution unchanged. Focusing on a model family which includes several popular pre-training approaches, e.g., autoregressive language models, we explore when models which generate distributions that are close have similar representations. We prove that a small Kullback--Leibler divergence between the model distributions does not guarantee that the corresponding representations are similar. This has the important corollary that models with near-maximum data likelihood can still learn dissimilar representations -- a phenomenon mirrored in our experiments with models trained on CIFAR-10. We then define a distributional distance for which closeness implies representational similarity, and in synthetic experiments, we find that wider networks learn distributions which are closer with respect to our distance and have more similar representations. Our results thus clarify the link between closeness in distribution and representational similarity.

Paper Structure

This paper contains 52 sections, 19 theorems, 188 equations, 18 figures, 2 tables, 1 algorithm.

Key Result

Theorem 2.2

Let $(\bm{\mathrm{f}}, \bm{\mathrm{g}}), (\bm{\mathrm{f}}', \bm{\mathrm{g}}') \in \Theta$, and $(\bm{\mathrm{f}}, \bm{\mathrm{g}})$ satisfy the diversity condition (def:diversity). Let $\mathbf{L}$ (resp. $\mathbf{L}'$) be the matrix with columns $\mathbf{g}_0(y)$ (resp. $\mathbf{g}'_0(y)$), and let where the equivalence relation $\sim_L$ is defined by

Figures (18)

  • Figure 1: When closeness in distribution does and does not imply representational similarity. On the left, we show two distributions $p_{\mathbf{f}, \mathbf{g}}, p_{\mathbf{f}', \mathbf{g}'} \in \mathcal{P}_\Theta$ which are closer than $\epsilon$ w.r.t. the distance $d^\lambda_{\mathrm{LLV}}$ (\ref{['def:d_prob']}), as illustrated by the shaded blue ball. We use $[(\bm{\mathrm{f}}, \bm{\mathrm{g}})] \in \Theta / \sim_L$ to denote the identifiability class (\ref{['fn:iclass']}) of a $\sim_L$-identifiable model with embedding$\bm{\mathrm{f}}$ and unembedding$\bm{\mathrm{g}}$ (\ref{['sec:preliminaries']}). \ref{['theorem:main_bounds_d_prob']} implies that the identifiability classes $[(\bm{\mathrm{f}},\bm{\mathrm{g}})]$ and $[(\bm{\mathrm{f}}', \bm{\mathrm{g}}')]$, within the shaded orange area, will have similar representations---i.e., their dissimilarity under $d_{\bm{\mathrm{f}}, \bm{\mathrm{g}}}$ (\ref{['def:model_rep_dissim']}) is bounded above by $2M\epsilon$. We also consider a third distribution $p_{\tilde{\bm{\mathrm{f}}}, \tilde{\bm{\mathrm{g}}}} \in \mathcal{P}_\Theta$ which, while $\epsilon$-close in $d_{\mathrm{KL}}$ (in magenta), falls outside the blue region and has representations that are very dissimilar from those of $[(\bm{\mathrm{f}}, \bm{\mathrm{g}})]$ and $[(\bm{\mathrm{f}}', \bm{\mathrm{g}}')]$, as described by our \ref{['theorem:small_KL_dissimilar_reps']}. On the right, we plot the three model embeddings: Taking $\bm{\mathrm{f}}$ as reference, we find the best linear fit to $\bm{\mathrm{f}}'$ and $\tilde{\bm{\mathrm{f}}}$, and then color each of the points according to the residual error (brighter colors denote larger errors). The embeddings $\bm{\mathrm{f}}'$ are nearly a linear transformation of $\bm{\mathrm{f}}$, while $\tilde{\bm{\mathrm{f}}}$ shows substantial deviation---visibly farther from being linearly related.
  • Figure 2: Two models with small KL divergence but highly dissimilar representations. We construct two models $(\bm{\mathrm{f}}, \bm{\mathrm{g}}), (\bm{\mathrm{f}}', \bm{\mathrm{g}}') \in \Theta$ whose representations are related by a non-linear transformation: The embeddings and unembeddings of the $(\bm{\mathrm{f}}', \bm{\mathrm{g}}')$ model are constructed by permuting the embedding clusters and the corresponding unembedding vectors of the $(\bm{\mathrm{f}}, \bm{\mathrm{g}})$ model. As a result, the nearest unembedding vectors in $\bm{\mathrm{g}}(\cdot)$ (dashed lines) are mapped away from each other in $\bm{\mathrm{g}}'(\cdot)$. In \ref{['theorem:small_KL_dissimilar_reps']}, we show that as the norm of the unembedding vectors $\rho$ grows for both models, their distributions $p_{\bm{\mathrm{f}}, \bm{\mathrm{g}}}$ and $p_{\bm{\mathrm{f}}', \bm{\mathrm{g}}'}$ become closer in KL divergence, whereas their representations remain dissimilar---i.e., far from being equal up to a linear transformation, see also \ref{['table:kl_to_zero']}.
  • Figure 3: (Left) Embedding representations of two models trained on CIFAR-10. Representations for some of the labels are permuted, and $m_{\mathrm{CCA}} (\mathbf{f}(\mathbf{x}), \mathbf{f}'(\mathbf{x})) = 0.55$. (Right) Mean $d^\lambda_{\mathrm{LLV}}$ and $d_{\mathbf{f},\mathbf{g}}$ vs network width. Shaded area is standard deviation. Both mean and standard deviation decrease as the network width increases.
  • Figure 4: Illustration of training data for 6 classes. Each color represents a different class label.
  • Figure 5: For models trained on CIFAR-10 with representational dimensions of 2, 3 and 5, difference in test loss vs $m_\mathrm{CCA}$ of the embeddings of the models. We see that there can both be a small difference in loss and a larger difference in representations or a larger difference in loss and a smaller difference in representations.
  • ...and 13 more figures

Theorems & Definitions (42)

  • Definition 2.1: Diversity condition
  • Theorem 2.2: Linear Identifiability khemakhem2020iceroeder2021linearlachapelle2023synergies
  • Theorem 3.1: Informal
  • proof : Proof sketch
  • Corollary 3.2: Informal
  • Lemma 4.1
  • Remark 4.2
  • Definition 4.3: Log-likelihood variance distance between distributions
  • Definition 4.3: PLS SVD distance between vectors
  • Definition 4.3: Representational dissimilarity measure $d_{\vf,\vg}$
  • ...and 32 more