Table of Contents
Fetching ...

Lipschitz-regularized gradient flows and generative particle algorithms for high-dimensional scarce data

Hyemin Gu, Panagiota Birmpa, Yannis Pantazis, Luc Rey-Bellet, Markos A. Katsoulakis

TL;DR

It is demonstrated that the proposed algorithms correctly transport gene expression data points with dimension exceeding 54K, while the sample size is typically only in the hundreds, as a highlighted result in data integration.

Abstract

We build a new class of generative algorithms capable of efficiently learning an arbitrary target distribution from possibly scarce, high-dimensional data and subsequently generate new samples. These generative algorithms are particle-based and are constructed as gradient flows of Lipschitz-regularized Kullback-Leibler or other $f$-divergences, where data from a source distribution can be stably transported as particles, towards the vicinity of the target distribution. As a highlighted result in data integration, we demonstrate that the proposed algorithms correctly transport gene expression data points with dimension exceeding 54K, while the sample size is typically only in the hundreds.

Lipschitz-regularized gradient flows and generative particle algorithms for high-dimensional scarce data

TL;DR

It is demonstrated that the proposed algorithms correctly transport gene expression data points with dimension exceeding 54K, while the sample size is typically only in the hundreds, as a highlighted result in data integration.

Abstract

We build a new class of generative algorithms capable of efficiently learning an arbitrary target distribution from possibly scarce, high-dimensional data and subsequently generate new samples. These generative algorithms are particle-based and are constructed as gradient flows of Lipschitz-regularized Kullback-Leibler or other -divergences, where data from a source distribution can be stably transported as particles, towards the vicinity of the target distribution. As a highlighted result in data integration, we demonstrate that the proposed algorithms correctly transport gene expression data points with dimension exceeding 54K, while the sample size is typically only in the hundreds.
Paper Structure (41 sections, 8 theorems, 53 equations, 17 figures, 5 tables, 1 algorithm)

This paper contains 41 sections, 8 theorems, 53 equations, 17 figures, 5 tables, 1 algorithm.

Key Result

Theorem 1

Assume $f$ is superlinear, strictly convex and $P,Q \in \mathcal{P}_1(\mathbb{R}^d)$. We define where the optimizer $\phi^{L,*} \in \Gamma_L$ exists, is defined on $\textrm{supp}(P)\cup \textrm{supp}(Q)$, and is unique up to a constant. Subsequently, we extend $\phi^{L,*}$ in all of $\mathbb{R}^d$ using eq:phi_extensions:sec2. Let $\rho$ be a signed measure of total mass $0$ and let $\rho=\rho Th

Figures (17)

  • Figure 1: Sierpinski carpet embedded in 3D. Source data (purple particles) are transported via GPA close to the target data (cyan particles). The target particles were sampled from a Sierpinski carpet of level $4$ by omitting all finer scales. See \ref{['fig: Sierpinski carpet']} for a related 2D demonstration and a comparison to GANs.
  • Figure 2: (2D Mixture of Gaussians) Kinetic energy of particles \ref{['eq:Fisher_on_particles']} for $(f_\text{KL}, {\Gamma_L})$-GPA with different $L$'s. \ref{['thm:dissipation']} suggests that particles need to slow down and practically stop when they reach the “vicinity” of the target particles.
  • Figure 3: (2D Mixture of Gaussians) We empirically observe that Lipschitz constant $L$ controls the propagation speed of $(f_\text{KL}, {\Gamma_L})$-GPA with different $L$'s. For $L < \infty$, the particles are propagated to the 4 wells. As $L$ gets larger, the algorithm becomes more unstable. For $L = \infty$ (unregularized KL) GPA fails to capture the target.
  • Figure 4: (Gaussian to Student-t with $\nu=0.5$ in 2D) We consider 200 initial samples from $N((10,10), 0.5^2I)$, transported towards 200 target samples from $Student-t(\nu)$ with $\nu=0.5$ using $(f, \Gamma_1)$-GPA's for $f=f_\text{KL}$ and $f=f_\alpha$ with $\alpha=2, 10$. (a)$(f, \Gamma_1)$-divergences are computed by the corresponding estimator in \ref{['eq:GPA:sec3']}. $(f_\text{KL}, \Gamma_1)$-GPA collapses at around $t=202$ as the function optimization step with $f_\text{KL}$ is numerically unstable on heavy-tailed data while $(f_\alpha, \Gamma_1)$-GPA with $\alpha=2, 10$ propagate particles stably during the entire simulation window. See \ref{['fig:2D student-t_additional']} for details. However, GPA still appears to take a long time to transport particles deep into the heavy tails due to the speed restriction of the Lipschitz regularization. Stability in performance that lacks in accuracy is manifested in the relatively large size of the $\alpha$-divergences. (b) We observed that $(f_\alpha, \Gamma_1)$-GPA with $\alpha=10$ transports particles further and deeper into the tails than $(f_\alpha, \Gamma_1)$-GPA with $\alpha=2$.
  • Figure 5: (MNIST) GPA for image generation given scarce target data. (a) A subset of the $N=200$ target samples. Results in (b-c) are generated by $(f_\text{KL}, \Gamma_5)$-GPA based on the first two strategies in \ref{['sec:generalization-overfit']}. We report GPA results with $L=5$, which was empirically found to generate samples stably and in a reasonable amount of time. (b)$M=600$ initial particles from $Unif([0,1]^{784})$ were transported toward the target in the setting of $M \gg N$, which promotes sample diversity. See \ref{['fig:mnist:si']} for details. (c) A new set of 600 initial particles from $Unif([0,1]^{784})$ were transported through the previously learned vector fields. These transported samples are referred to as generated particles, as explained in \ref{['sec:generalization-overfit']}. Training time: 5000 time steps ($T=2500$) or 48 minutes in the setting \ref{['subsec:appendix:nn:architecture:comp:resources']}.
  • ...and 12 more figures

Theorems & Definitions (18)

  • Theorem 1: first variation of Lipschitz regularized $f$-divergences
  • Remark 1
  • Lemma 1
  • Proof 1: Proof of Theorem \ref{['thm:first variation']}
  • Remark 2: Algorithmic perspectives and related results
  • Theorem 2: Lipschitz-regularized dissipation
  • Remark 3: Formal asymptotics of Lipschitz-regularized gradient flows
  • Remark 4: Lipschitz regularization for GPA
  • Remark 5: Improved accuracy and higher-order schemes
  • Theorem 3: Autoencoder performance guarantees
  • ...and 8 more