Table of Contents
Fetching ...

On optimal solutions of classical and sliced Wasserstein GANs with non-Gaussian data

Yu-Jui Huang, Hsin-Hua Shen, Yu-Chih Huang, Wan-Yi Lin, Shih-Chun Lin

TL;DR

The paper extends population-WGAN analysis beyond the LQG setting to non-Gaussian data, deriving closed-form optimal parameters for one-dimensional WGANs with nonlinear generators. It then shows that in high dimensions, linear generators are asymptotically optimal for both original and unprojected sliced WGANs under $q=2$, by bounding with Gaussian projections and exploiting isotropy via Schur-convexity and the Carlson-R function. A rigorous set of proofs leverages optimal transport theory and case analyses (notably for ReLU activations), while an empirical study validates the theory and demonstrates computational advantages over r-PCA due to linear complexity. The unprojected sliced-WGAN variant preserves full marginal information and inherits the same asymptotic optimality, broadening the practical impact for large-scale, decentralized or resource-constrained settings.

Abstract

The generative adversarial network (GAN) aims to approximate an unknown distribution via a parameterized neural network (NN). While GANs have been widely applied in reinforcement and semi-supervised learning as well as computer vision tasks, selecting their parameters often needs an exhaustive search, and only a few selection methods have been proven to be theoretically optimal. One of the most promising GAN variants is the Wasserstein GAN (WGAN). Prior work on optimal parameters for population WGAN is limited to the linear-quadratic-Gaussian (LQG) setting, where the generator NN is linear, and the data is Gaussian. In this paper, we focus on the characterization of optimal solutions of population WGAN beyond the LQG setting. As a basic result, closed-form optimal parameters for one-dimensional WGAN are derived when the NN has non-linear activation functions, and the data is non-Gaussian. For high-dimensional data, we adopt the sliced Wasserstein framework and show that the linear generator can be asymptotically optimal. Moreover, the original sliced WGAN only constrains the projected data marginal instead of the whole one in classical WGAN, and thus, we propose another new unprojected sliced WGAN and identify its asymptotic optimality. Empirical studies show that compared to the celebrated r-principal component analysis (r-PCA) solution, which has cubic complexity to the data dimension, our generator for sliced WGAN can achieve better performance with only linear complexity.

On optimal solutions of classical and sliced Wasserstein GANs with non-Gaussian data

TL;DR

The paper extends population-WGAN analysis beyond the LQG setting to non-Gaussian data, deriving closed-form optimal parameters for one-dimensional WGANs with nonlinear generators. It then shows that in high dimensions, linear generators are asymptotically optimal for both original and unprojected sliced WGANs under , by bounding with Gaussian projections and exploiting isotropy via Schur-convexity and the Carlson-R function. A rigorous set of proofs leverages optimal transport theory and case analyses (notably for ReLU activations), while an empirical study validates the theory and demonstrates computational advantages over r-PCA due to linear complexity. The unprojected sliced-WGAN variant preserves full marginal information and inherits the same asymptotic optimality, broadening the practical impact for large-scale, decentralized or resource-constrained settings.

Abstract

The generative adversarial network (GAN) aims to approximate an unknown distribution via a parameterized neural network (NN). While GANs have been widely applied in reinforcement and semi-supervised learning as well as computer vision tasks, selecting their parameters often needs an exhaustive search, and only a few selection methods have been proven to be theoretically optimal. One of the most promising GAN variants is the Wasserstein GAN (WGAN). Prior work on optimal parameters for population WGAN is limited to the linear-quadratic-Gaussian (LQG) setting, where the generator NN is linear, and the data is Gaussian. In this paper, we focus on the characterization of optimal solutions of population WGAN beyond the LQG setting. As a basic result, closed-form optimal parameters for one-dimensional WGAN are derived when the NN has non-linear activation functions, and the data is non-Gaussian. For high-dimensional data, we adopt the sliced Wasserstein framework and show that the linear generator can be asymptotically optimal. Moreover, the original sliced WGAN only constrains the projected data marginal instead of the whole one in classical WGAN, and thus, we propose another new unprojected sliced WGAN and identify its asymptotic optimality. Empirical studies show that compared to the celebrated r-principal component analysis (r-PCA) solution, which has cubic complexity to the data dimension, our generator for sliced WGAN can achieve better performance with only linear complexity.

Paper Structure

This paper contains 17 sections, 19 theorems, 183 equations, 4 figures, 1 table.

Key Result

Proposition 1

Assume CDF $F_\mu$ of $\mu$ and CDF $\Psi$ of $h(Z)$ in G are continuous and strictly increasing, and variance $\operatorname{Var}(h(Z))>0$. If the population WGAN eq_transportGANE with $q=2,d=1$ has a unique minimizer for $(\theta_1,\theta_2)\in\mathbb{R}\times\mathbb{R}$ as if Cov>0 is not met, $(\theta_1^*,\theta_2^*)$ is given by replacing $\theta^*_2$ in theta's by where $\mathbb{E}_g$ is

Figures (4)

  • Figure 1: Comparison of optimal parameters in \ref{['eq_W2thetaOpt1']}\ref{['optParaLinear1']} with their estimates \ref{['eq_empiricaltheta2']} using synthetic data under $q=2, d=1$
  • Figure 2: Convergence of $\theta_1$ and $\theta_2$ form SGD using \ref{['eq_transportGANE']} for $q=1,d=1$
  • Figure 3: Comparison of sliced Wasserstein distance with our linear generator \ref{['optimallinearparameter']} and that from r-PCA Tse20 when data is i.i.d. Laplace
  • Figure 4: Comparison of sliced Wasserstein distance with our linear generator \ref{['optimallinearparameter']} and that from r-PCA Tse20 when data is correlated AR model

Theorems & Definitions (31)

  • Proposition 1
  • Theorem 1
  • Corollary 1
  • Corollary 2
  • Theorem 2
  • Lemma 1
  • Remark 1
  • Remark 2
  • Theorem 3
  • proof
  • ...and 21 more