Table of Contents
Fetching ...

A Geometric View of Data Complexity: Efficient Local Intrinsic Dimension Estimation with Diffusion Models

Hamidreza Kamkari, Brendan Leigh Ross, Rasa Hosseinzadeh, Jesse C. Cresswell, Gabriel Loaiza-Ganem

TL;DR

This work shows that the Fokker-Planck equation associated with a DM can provide an LID estimator which addresses the aforementioned deficiencies, and produces an estimator, called FLIPD, which is orders of magnitude faster than other LID estimators, and the first to be tractable at the scale of Stable Diffusion.

Abstract

High-dimensional data commonly lies on low-dimensional submanifolds, and estimating the local intrinsic dimension (LID) of a datum -- i.e. the dimension of the submanifold it belongs to -- is a longstanding problem. LID can be understood as the number of local factors of variation: the more factors of variation a datum has, the more complex it tends to be. Estimating this quantity has proven useful in contexts ranging from generalization in neural networks to detection of out-of-distribution data, adversarial examples, and AI-generated text. The recent successes of deep generative models present an opportunity to leverage them for LID estimation, but current methods based on generative models produce inaccurate estimates, require more than a single pre-trained model, are computationally intensive, or do not exploit the best available deep generative models: diffusion models (DMs). In this work, we show that the Fokker-Planck equation associated with a DM can provide an LID estimator which addresses the aforementioned deficiencies. Our estimator, called FLIPD, is easy to implement and compatible with all popular DMs. Applying FLIPD to synthetic LID estimation benchmarks, we find that DMs implemented as fully-connected networks are highly effective LID estimators that outperform existing baselines. We also apply FLIPD to natural images where the true LID is unknown. Despite being sensitive to the choice of network architecture, FLIPD estimates remain a useful measure of relative complexity; compared to competing estimators, FLIPD exhibits a consistently higher correlation with image PNG compression rate and better aligns with qualitative assessments of complexity. Notably, FLIPD is orders of magnitude faster than other LID estimators, and the first to be tractable at the scale of Stable Diffusion.

A Geometric View of Data Complexity: Efficient Local Intrinsic Dimension Estimation with Diffusion Models

TL;DR

This work shows that the Fokker-Planck equation associated with a DM can provide an LID estimator which addresses the aforementioned deficiencies, and produces an estimator, called FLIPD, which is orders of magnitude faster than other LID estimators, and the first to be tractable at the scale of Stable Diffusion.

Abstract

High-dimensional data commonly lies on low-dimensional submanifolds, and estimating the local intrinsic dimension (LID) of a datum -- i.e. the dimension of the submanifold it belongs to -- is a longstanding problem. LID can be understood as the number of local factors of variation: the more factors of variation a datum has, the more complex it tends to be. Estimating this quantity has proven useful in contexts ranging from generalization in neural networks to detection of out-of-distribution data, adversarial examples, and AI-generated text. The recent successes of deep generative models present an opportunity to leverage them for LID estimation, but current methods based on generative models produce inaccurate estimates, require more than a single pre-trained model, are computationally intensive, or do not exploit the best available deep generative models: diffusion models (DMs). In this work, we show that the Fokker-Planck equation associated with a DM can provide an LID estimator which addresses the aforementioned deficiencies. Our estimator, called FLIPD, is easy to implement and compatible with all popular DMs. Applying FLIPD to synthetic LID estimation benchmarks, we find that DMs implemented as fully-connected networks are highly effective LID estimators that outperform existing baselines. We also apply FLIPD to natural images where the true LID is unknown. Despite being sensitive to the choice of network architecture, FLIPD estimates remain a useful measure of relative complexity; compared to competing estimators, FLIPD exhibits a consistently higher correlation with image PNG compression rate and better aligns with qualitative assessments of complexity. Notably, FLIPD is orders of magnitude faster than other LID estimators, and the first to be tractable at the scale of Stable Diffusion.
Paper Structure (54 sections, 3 theorems, 60 equations, 26 figures, 9 tables)

This paper contains 54 sections, 3 theorems, 60 equations, 26 figures, 9 tables.

Key Result

Theorem 3.1

Let $\mathcal{L}$ be an embedded submanifold of $\mathbb{R}^D$ given by a $d$-dimensional affine subspace. If $p(\cdot, 0)$ is supported on $\mathcal{L}$, continuous, and with finite second moments, then for any $x \in \mathcal{L}$ with $p(x, 0)>0$, we have:

Figures (26)

  • Figure 1: (Left) A cartoon illustration showing that LID is a natural measure of relative complexity. We depict two manifolds of MNIST digits, corresponding to 1s and 8s, as 1-dimensional and 2-dimensional submanifolds of $\mathbb{R}^3$, respectively. The relatively simpler manifold of 1s exhibits a single factor of variation ("tilt"), whereas 8s have an additional factor of variation ("disproportionality"). (Right) The 4 lowest- and highest-LID datapoints from a subsample of LAION-Aesthetics, as measured by our method, FLIPD, applied to Stable Diffusion v1.5. FLIPD scales efficiently to large models on high-dimensional data, and aligns closely with subjective complexity.
  • Figure 2: FLIPD curves with knees at the true LID.
  • Figure 3: "String within a doughnut" manifolds, and corresponding FLIPD estimates for different values of $t_0$ ($t_0=0.05$ on top and $t_0=0.65$ on bottom). These results highlight the multiscale nature of FLIPD.
  • Figure 4: Overview of image LID: (a) shows the FLIPD curves that are used to estimate average LID for MNIST and FMNIST when using MLP backbones; (b) compares images with small and large FLIPD estimates from FMNIST, MNIST, SVHN, and CIFAR10 when using UNet backbones; and (c) compares LAION images with small and large FLIPD estimates using Stable Diffusion (top, $t_0 = 0.3$) and PNG compression sizes (bottom).
  • Figure 5: The FLIPD estimates on a Lollipop dataset from tempczyk2022lidl.
  • ...and 21 more figures

Theorems & Definitions (6)

  • Theorem 3.1: FLIPD Soundness: Linear Case
  • proof
  • Lemma B.1
  • proof
  • Theorem B.1: FLIPD Soundness: Linear Case
  • proof