Table of Contents
Fetching ...

Generative Learning of Densities on Manifolds

Dimitris G. Giovanis, Ellis Crabtree, Roger G. Ghanem, Ioannis G. Kevrekidis

TL;DR

This work addresses sampling high-dimensional data densities constrained to unknown low-dimensional manifolds. It combines diffusion-model-based generative methods with manifold learning by using Diffusion Maps to uncover latent coordinates and Double Diffusion Maps to lift samples back to the ambient space. Two latent-space sampling strategies are proposed: a score-based diffusion model (m-SGM) and a probabilistic Itô SDE/PLoM approach (m-PLoM). Through a synthetic S-shaped dataset and a multiscale material system, the framework demonstrates efficient, manifold-respecting density estimation and realistic realizations suitable for applications in digital twins and uncertainty quantification.

Abstract

A generative modeling framework is proposed that combines diffusion models and manifold learning to efficiently sample data densities on manifolds. The approach utilizes Diffusion Maps to uncover possible low-dimensional underlying (latent) spaces in the high-dimensional data (ambient) space. Two approaches for sampling from the latent data density are described. The first is a score-based diffusion model, which is trained to map a standard normal distribution to the latent data distribution using a neural network. The second one involves solving an Itô stochastic differential equation in the latent space. Additional realizations of the data are generated by lifting the samples back to the ambient space using Double Diffusion Maps, a recently introduced technique typically employed in studying dynamical system reduction; here the focus lies in sampling densities rather than system dynamics. The proposed approaches enable sampling high dimensional data densities restricted to low-dimensional, a priori unknown manifolds. The efficacy of the proposed framework is demonstrated through a benchmark problem and a material with multiscale structure.

Generative Learning of Densities on Manifolds

TL;DR

This work addresses sampling high-dimensional data densities constrained to unknown low-dimensional manifolds. It combines diffusion-model-based generative methods with manifold learning by using Diffusion Maps to uncover latent coordinates and Double Diffusion Maps to lift samples back to the ambient space. Two latent-space sampling strategies are proposed: a score-based diffusion model (m-SGM) and a probabilistic Itô SDE/PLoM approach (m-PLoM). Through a synthetic S-shaped dataset and a multiscale material system, the framework demonstrates efficient, manifold-respecting density estimation and realistic realizations suitable for applications in digital twins and uncertainty quantification.

Abstract

A generative modeling framework is proposed that combines diffusion models and manifold learning to efficiently sample data densities on manifolds. The approach utilizes Diffusion Maps to uncover possible low-dimensional underlying (latent) spaces in the high-dimensional data (ambient) space. Two approaches for sampling from the latent data density are described. The first is a score-based diffusion model, which is trained to map a standard normal distribution to the latent data distribution using a neural network. The second one involves solving an Itô stochastic differential equation in the latent space. Additional realizations of the data are generated by lifting the samples back to the ambient space using Double Diffusion Maps, a recently introduced technique typically employed in studying dynamical system reduction; here the focus lies in sampling densities rather than system dynamics. The proposed approaches enable sampling high dimensional data densities restricted to low-dimensional, a priori unknown manifolds. The efficacy of the proposed framework is demonstrated through a benchmark problem and a material with multiscale structure.

Paper Structure

This paper contains 17 sections, 49 equations, 17 figures, 1 table, 4 algorithms.

Figures (17)

  • Figure 1: (a) Three-dimensional S-shaped data with 10,000 points. (b) 5,000 samples generated using the MCS-based SGM.
  • Figure 2: Comparison of the three SGM marginal densities (red dashed lines) against the ground truth densities (blue solid lines)
  • Figure 3: Two-dimensional projections of the original (blue) and the sampled (red) data.
  • Figure 4: (a) Diffusion Maps coordinates $\boldsymbol{\phi}_1, \boldsymbol{\phi}_5$ of the three-dimensional dataset. (b) The residual $r_k$ indicates that $\boldsymbol{\phi}_1, \boldsymbol{\phi}_5$ are the two non-harmonic coordinates.
  • Figure 5: 100,000 generated points (blue) in the latent space using: (a) m-SGM1 (same behavior is observed for m-SGM2). The black points correspond to diffusion maps coordinates of the original dataset. (b) 30,000 selected points (green) using KDTree with 10 nearest neighbors.
  • ...and 12 more figures