Intrinsic Dimension Correlation: uncovering nonlinear connections in multimodal representations

Lorenzo Basile; Santiago Acevedo; Luca Bortolussi; Fabio Anselmi; Alex Rodriguez

Intrinsic Dimension Correlation: uncovering nonlinear connections in multimodal representations

Lorenzo Basile, Santiago Acevedo, Luca Bortolussi, Fabio Anselmi, Alex Rodriguez

TL;DR

The paper tackles the challenge of detecting nonlinear correlations between high-dimensional data manifolds, especially across multimodal latent spaces, where traditional correlation metrics often fail. It introduces Intrinsic Dimension Correlation ($I_d$Cor), a mutual-information-like coefficient that uses intrinsic dimension estimates to measure shared information between representations, with a permutation-based $p$-value for significance. Through synthetic experiments, large-scale ImageNet evaluations, and multimodal datasets, $I_d$Cor demonstrates robust detection of nonlinear dependencies that are missed by baselines such as CCA and Distance Correlation, including cross-modal correlations in vision-language models. The results suggest $I_d$Cor as a scalable, informative tool for understanding latent-space geometry and guiding representation learning across diverse domains, while highlighting limitations tied to $I_d$ estimation and opportunities for local-manifold extensions and total-correlation analyses.

Abstract

To gain insight into the mechanisms behind machine learning methods, it is crucial to establish connections among the features describing data points. However, these correlations often exhibit a high-dimensional and strongly nonlinear nature, which makes them challenging to detect using standard methods. This paper exploits the entanglement between intrinsic dimensionality and correlation to propose a metric that quantifies the (potentially nonlinear) correlation between high-dimensional manifolds. We first validate our method on synthetic data in controlled environments, showcasing its advantages and drawbacks compared to existing techniques. Subsequently, we extend our analysis to large-scale applications in neural network representations. Specifically, we focus on latent representations of multimodal data, uncovering clear correlations between paired visual and textual embeddings, whereas existing methods struggle significantly in detecting similarity. Our results indicate the presence of highly nonlinear correlation patterns between latent manifolds.

Intrinsic Dimension Correlation: uncovering nonlinear connections in multimodal representations

TL;DR

Cor), a mutual-information-like coefficient that uses intrinsic dimension estimates to measure shared information between representations, with a permutation-based

-value for significance. Through synthetic experiments, large-scale ImageNet evaluations, and multimodal datasets,

Cor demonstrates robust detection of nonlinear dependencies that are missed by baselines such as CCA and Distance Correlation, including cross-modal correlations in vision-language models. The results suggest

Cor as a scalable, informative tool for understanding latent-space geometry and guiding representation learning across diverse domains, while highlighting limitations tied to

estimation and opportunities for local-manifold extensions and total-correlation analyses.

Abstract

Paper Structure (32 sections, 7 equations, 15 figures, 7 tables)

This paper contains 32 sections, 7 equations, 15 figures, 7 tables.

Introduction
Background
Correlation in latent representations
Multimodal latent space alignment
Intrinsic dimension
Correlation through Intrinsic Dimension
Results
Synthetic experiments
A motivating example on neural representations
ImageNet representations
Coarse alignment
Multimodal representations
Computational resources
Discussion
Limitations
...and 17 more sections

Figures (15)

Figure 1: Example usage of $I_d$Cor: we consider a $3$D dataset in which the points lie on the surface of a cylinder, hence whose intrinsic dimension ($I_d$) is $2$. We want to assess the correlation between the $2$D set of coordinates $xy$ (which also has $I_d=2$, as shown in the top-right panel) and the $1$D set $z$. Intuitively, as $x$ and $z$ describe a circle, it is evident that knowledge of $z$ (encoded by color) is very informative in determining the $x$ coordinate (e.g., a yellow point is sure to be found in the central region of the $x$ axis), but not in determining the $y$: hence, the correlation coefficient is $0.5$, according to equation \ref{['idcor_eq']}. Conversely, when estimating the correlation between $xz$ (whose $I_d$ is $1$, bottom-right panel) and $y$, we can see that having access to $y$ (which is now represented by color) does not give any information on the value of $x$ nor $z$, hence the correlation is $0$.
Figure 2: Average correlation results with different methods between MNIST data and their final representations computed by a randomly initialized MLP, with variable degree of activation nonlinearity, increasing on the $x$ axis. Shaded area represents standard deviation over 10 runs with independent random initialization of MLP weights.
Figure 3: Correlation results on ImageNet representations, obtained using: left Distance Correlation (dCor); right$I_d$Cor (ours). Both methods are able to detect non-negligible correlation. More baseline results are reported in the Appendix.
Figure 4: Correlation results on N24News representations, obtained using: left Distance Correlation (dCor); right$I_d$Cor (ours). Like all other baselines we evaluate, dCor is only able to spot correlations between encoders of the same modality, while $I_d$Cor reveals significant correlation for all model pairs.
Figure 5: Synthetic datasets, each associated with its intrinsic dimension.
...and 10 more figures

Intrinsic Dimension Correlation: uncovering nonlinear connections in multimodal representations

TL;DR

Abstract

Intrinsic Dimension Correlation: uncovering nonlinear connections in multimodal representations

Authors

TL;DR

Abstract

Table of Contents

Figures (15)