Table of Contents
Fetching ...

Complex variational autoencoders admit Kähler structure

Andrew Gracyk

TL;DR

<3-5 sentence high-level summary> The paper investigates complex variational autoencoders (VAEs) and shows that complex latent spaces admit a Kähler geometric structure linked to the Fisher information metric. It derives the complex Fisher metric under a complex Gaussian decoder and establishes that the Hessian of the KL divergence serves as a Kähler potential, enabling a principled geometric interpretation of latent representations. To make this geometry computationally practical, it introduces two Kähler potentials: an exact form tied to the KL Hessian and a scalable log-sum-exp surrogate that preserves plurisubharmonicity. The authors also propose curvature-aware sampling and a regularization term involving the determinant of the metric, demonstrating smoother representations and fewer semantic outliers while maintaining sampling efficiency.

Abstract

It has been discovered that latent-Euclidean variational autoencoders (VAEs) admit, in various capacities, Riemannian structure. We adapt these arguments but for complex VAEs with a complex latent stage. We show that complex VAEs reveal to some level Kähler geometric structure. Our methods will be tailored for decoder geometry. We derive the Fisher information metric in the complex case under a latent complex Gaussian with trivial relation matrix. It is well known from statistical information theory that the Fisher information coincides with the Hessian of the Kullback-Leibler (KL) divergence. Thus, the metric Kähler potential relation is exactly achieved under relative entropy. We propose a Kähler potential derivative of complex Gaussian mixtures that acts as a rough proxy to the Fisher information metric while still being faithful to the underlying Kähler geometry. Computation of the metric via this potential is efficient, and through our potential, valid as a plurisubharmonic (PSH) function, large scale computational burden of automatic differentiation is displaced to small scale. Our methods leverage the law of total covariance to bridge behavior between our potential and the Fisher metric. We show that we can regularize the latent space with decoder geometry, and that we can sample in accordance with a weighted complex volume element. We demonstrate these strategies, at the exchange of sample variation, yield consistently smoother representations and fewer semantic outliers.

Complex variational autoencoders admit Kähler structure

TL;DR

<3-5 sentence high-level summary> The paper investigates complex variational autoencoders (VAEs) and shows that complex latent spaces admit a Kähler geometric structure linked to the Fisher information metric. It derives the complex Fisher metric under a complex Gaussian decoder and establishes that the Hessian of the KL divergence serves as a Kähler potential, enabling a principled geometric interpretation of latent representations. To make this geometry computationally practical, it introduces two Kähler potentials: an exact form tied to the KL Hessian and a scalable log-sum-exp surrogate that preserves plurisubharmonicity. The authors also propose curvature-aware sampling and a regularization term involving the determinant of the metric, demonstrating smoother representations and fewer semantic outliers while maintaining sampling efficiency.

Abstract

It has been discovered that latent-Euclidean variational autoencoders (VAEs) admit, in various capacities, Riemannian structure. We adapt these arguments but for complex VAEs with a complex latent stage. We show that complex VAEs reveal to some level Kähler geometric structure. Our methods will be tailored for decoder geometry. We derive the Fisher information metric in the complex case under a latent complex Gaussian with trivial relation matrix. It is well known from statistical information theory that the Fisher information coincides with the Hessian of the Kullback-Leibler (KL) divergence. Thus, the metric Kähler potential relation is exactly achieved under relative entropy. We propose a Kähler potential derivative of complex Gaussian mixtures that acts as a rough proxy to the Fisher information metric while still being faithful to the underlying Kähler geometry. Computation of the metric via this potential is efficient, and through our potential, valid as a plurisubharmonic (PSH) function, large scale computational burden of automatic differentiation is displaced to small scale. Our methods leverage the law of total covariance to bridge behavior between our potential and the Fisher metric. We show that we can regularize the latent space with decoder geometry, and that we can sample in accordance with a weighted complex volume element. We demonstrate these strategies, at the exchange of sample variation, yield consistently smoother representations and fewer semantic outliers.

Paper Structure

This paper contains 14 sections, 105 equations, 12 figures, 1 table, 1 algorithm.

Figures (12)

  • Figure 1: We illustrate a component reduction method in order to examine the effect of curvature on the sampling procedure on (left) a complex Gaussian-sampled latent space; (right) a metric-sampled latent space according to our method. We construct a Laplacian Eigenmaps embedding on a curvature-weighted graph based on the Eigenproblem $Lv = \lambda Dv$ (note that a Gaussian solves $\lambda = 0, v=1$). In particular, we plot $(v_1(i),v_2(i))$ coordinate eigenfunctions that minimize the curvature-weighted energy problem subject to eigenconstraints. A Gaussian space is solved nearly trivially (Gaussian plots looks like a line, as in left, since one eigenfunction is constant), and curvature is conveyed (via a curved manifold) as in the right. As we can see, there is notable difference.
  • Figure 2: We show distributional equivalence by plotting the cumulative distribution of the $(1,1)$ real element of the first term Fisher information metric $(\partial_{\alpha} \mu)^{\dagger} \Sigma_j^{-1} ( \partial_{\overline{\beta}} \mu) + (\partial_{\overline{\beta}} \mu)^{\dagger} \Sigma_j^{-1} (\partial_{\alpha} \mu)$ along the maximum index versus computing the expectation of $\mathbb{E}[ \partial_{\alpha} \partial_{\overline{\beta}} \Psi]$ using the softmax. The purpose of this plot is to convey the asymptotic equivalence of Lemmma 1, i.e. the softmax will be dominated by a singular weight.
  • Figure 3: We illustrate runtime in constructing the Hermitian metrics via sampling. We choose $\alpha = \overline{\beta}=1$. We truncate the number of data points for the nearest neighbors search case, since this gave CUDA out of memory errors in high cases.
  • Figure 4: We present a cosine similarity between $\mathop{\mathrm{arg\,min}}\limits_i \Psi_i$ and $\mathop{\mathrm{arg\,max}}\limits_i \Psi_i$ on 50 posterior samples corresponding to $2 \times \text{latent dim}$ indices (so one real, one imaginary) of our Kähler metric using pure weights of $w=1$ to replace the softmax function as of \ref{['eqn:improved_weights']}. The purpose of this plot is to convey the metric elements among these indices are proportional, thus the iterated expectation of \ref{['eqn:iterated_expectation']} holds reasonably well. This figure was created using our MNIST experiment.
  • Figure 5: We illustrate the condition as in Theorem 3, $\partial v \approx M \Sigma^{-1} v$, is reasonable through a cosine similarity. In particular, a similarity of $+1\in [ -1, +1]$ is most desirable. Ours is 0.793, which means there is reasonable similarity.
  • ...and 7 more figures