When Does Closeness in Distribution Imply Representational Similarity? An Identifiability Perspective
Beatrix M. G. Nielsen, Emanuele Marconato, Andrea Dittadi, Luigi Gresele
TL;DR
This work advances a principled theory for when closeness of model output distributions implies similarity of internal representations by leveraging identifiability concepts. It shows that traditional distributional distances like KL divergence can be misleading: two models can have near-identical likelihoods yet highly dissimilar representations. To remedy this, it introduces a log-likelihood variance distance $d^{\lambda}_{\mathrm{LLV}}$ and a representation distance $d_{\mathbf{f},\mathbf{g}}$ based on PLS-SVD, plus a bound that links the two under suitable diversity and invertibility conditions. Empirically, CIFAR-10 experiments reveal dissimilar representations despite similar performance, while synthetic experiments demonstrate that wider networks yield closer distributions and more similar representations, supporting the claim that distributional closeness does not always guarantee representational similarity. Overall, the paper clarifies the nuanced relationship between distributional proximity and representation similarity and offers a concrete framework for analyzing it.
Abstract
When and why representations learned by different deep neural networks are similar is an active research topic. We choose to address these questions from the perspective of identifiability theory, which suggests that a measure of representational similarity should be invariant to transformations that leave the model distribution unchanged. Focusing on a model family which includes several popular pre-training approaches, e.g., autoregressive language models, we explore when models which generate distributions that are close have similar representations. We prove that a small Kullback--Leibler divergence between the model distributions does not guarantee that the corresponding representations are similar. This has the important corollary that models with near-maximum data likelihood can still learn dissimilar representations -- a phenomenon mirrored in our experiments with models trained on CIFAR-10. We then define a distributional distance for which closeness implies representational similarity, and in synthetic experiments, we find that wider networks learn distributions which are closer with respect to our distance and have more similar representations. Our results thus clarify the link between closeness in distribution and representational similarity.
