On Convolutions, Intrinsic Dimension, and Diffusion Models
Kin Kwan Leung, Rasa Hosseinzadeh, Gabriel Loaiza-Ganem
TL;DR
This paper addresses how diffusion-driven local intrinsic dimension $LID(x)$ can be rigorously inferred from density changes under noise, extending prior FLIPD theory beyond affine submanifolds. It proves that, under general conditions, the limit $LID(x)=D+igl.rac{ ext{d}}{ ext{d} au}igl[ ext{log} ho_{ ext N}(x, au)igr]igr|_{ au o-\infty}$ holds for general disjoint unions of submanifolds, not just affine ones, and provides a parallel result for uniform noise: $rac{ ext{d}}{ ext{d} au}igl[ ext{log} ho_{ ext U}(x, au)igr] o d-D$. The Gaussian case yields a precise limit $d-D$ on each component, and corollaries extend to unions, thereby fully justifying FLIPD in realistic settings, while the uniform-case result connects $LID$ to ball probabilities and offers additional theoretical insight. Together, these results solidify the theoretical foundation for using FLIPD with state-of-the-art diffusion models and motivate future work on broader noise models and error bounds under imperfect score estimation.
Abstract
The manifold hypothesis asserts that data of interest in high-dimensional ambient spaces, such as image data, lies on unknown low-dimensional submanifolds. Diffusion models (DMs) -- which operate by convolving data with progressively larger amounts of Gaussian noise and then learning to revert this process -- have risen to prominence as the most performant generative models, and are known to be able to learn distributions with low-dimensional support. For a given datum in one of these submanifolds, we should thus intuitively expect DMs to have implicitly learned its corresponding local intrinsic dimension (LID), i.e. the dimension of the submanifold it belongs to. Kamkari et al. (2024b) recently showed that this is indeed the case by linking this LID to the rate of change of the log marginal densities of the DM with respect to the amount of added noise, resulting in an LID estimator known as FLIPD. LID estimators such as FLIPD have a plethora of uses, among others they quantify the complexity of a given datum, and can be used to detect outliers, adversarial examples and AI-generated text. FLIPD achieves state-of-the-art performance at LID estimation, yet its theoretical underpinnings are incomplete since Kamkari et al. (2024b) only proved its correctness under the highly unrealistic assumption of affine submanifolds. In this work we bridge this gap by formally proving the correctness of FLIPD under realistic assumptions. Additionally, we show that an analogous result holds when Gaussian convolutions are replaced with uniform ones, and discuss the relevance of this result.
