Intrinsic Dimension Correlation: uncovering nonlinear connections in multimodal representations
Lorenzo Basile, Santiago Acevedo, Luca Bortolussi, Fabio Anselmi, Alex Rodriguez
TL;DR
The paper tackles the challenge of detecting nonlinear correlations between high-dimensional data manifolds, especially across multimodal latent spaces, where traditional correlation metrics often fail. It introduces Intrinsic Dimension Correlation ($I_d$Cor), a mutual-information-like coefficient that uses intrinsic dimension estimates to measure shared information between representations, with a permutation-based $p$-value for significance. Through synthetic experiments, large-scale ImageNet evaluations, and multimodal datasets, $I_d$Cor demonstrates robust detection of nonlinear dependencies that are missed by baselines such as CCA and Distance Correlation, including cross-modal correlations in vision-language models. The results suggest $I_d$Cor as a scalable, informative tool for understanding latent-space geometry and guiding representation learning across diverse domains, while highlighting limitations tied to $I_d$ estimation and opportunities for local-manifold extensions and total-correlation analyses.
Abstract
To gain insight into the mechanisms behind machine learning methods, it is crucial to establish connections among the features describing data points. However, these correlations often exhibit a high-dimensional and strongly nonlinear nature, which makes them challenging to detect using standard methods. This paper exploits the entanglement between intrinsic dimensionality and correlation to propose a metric that quantifies the (potentially nonlinear) correlation between high-dimensional manifolds. We first validate our method on synthetic data in controlled environments, showcasing its advantages and drawbacks compared to existing techniques. Subsequently, we extend our analysis to large-scale applications in neural network representations. Specifically, we focus on latent representations of multimodal data, uncovering clear correlations between paired visual and textual embeddings, whereas existing methods struggle significantly in detecting similarity. Our results indicate the presence of highly nonlinear correlation patterns between latent manifolds.
