Table of Contents
Fetching ...

On the Optimization Landscape of Maximum Mean Discrepancy

Itai Alon, Amir Globerson, Ami Wiesel

TL;DR

The paper addresses the theoretical gap for implicit generative models by analyzing the optimization landscape of Maximum Mean Discrepancy (MMD) learning in three Gaussian settings: unknown mean, low-rank covariance, and a symmetric two-Gaussian mixture. It shows that in each setting, the MMD objective has no spurious local minima and all non-global stationary points are strict saddles, enabling gradient-based methods to converge to the global optimum; these results are complemented by formal proofs and empirical validation. The experiments compare MMD, OS-MMD, MLE, and WGAN, demonstrating that MMD-based methods either match or outperform likelihood-based or adversarial baselines, particularly in ill-conditioned or near-singular covariance scenarios. The findings provide theoretical guarantees for likelihood-free training of generative models and suggest practical guidance on bandwidth choices and sample regimes to ensure reliable optimization in practice.

Abstract

Generative models have been successfully used for generating realistic signals. Because the likelihood function is typically intractable in most of these models, the common practice is to use "implicit" models that avoid likelihood calculation. However, it is hard to obtain theoretical guarantees for such models. In particular, it is not understood when they can globally optimize their non-convex objectives. Here we provide such an analysis for the case of Maximum Mean Discrepancy (MMD) learning of generative models. We prove several optimality results, including for a Gaussian distribution with low rank covariance (where likelihood is inapplicable) and a mixture of Gaussians. Our analysis shows that that the MMD optimization landscape is benign in these cases, and therefore gradient based methods will globally minimize the MMD objective.

On the Optimization Landscape of Maximum Mean Discrepancy

TL;DR

The paper addresses the theoretical gap for implicit generative models by analyzing the optimization landscape of Maximum Mean Discrepancy (MMD) learning in three Gaussian settings: unknown mean, low-rank covariance, and a symmetric two-Gaussian mixture. It shows that in each setting, the MMD objective has no spurious local minima and all non-global stationary points are strict saddles, enabling gradient-based methods to converge to the global optimum; these results are complemented by formal proofs and empirical validation. The experiments compare MMD, OS-MMD, MLE, and WGAN, demonstrating that MMD-based methods either match or outperform likelihood-based or adversarial baselines, particularly in ill-conditioned or near-singular covariance scenarios. The findings provide theoretical guarantees for likelihood-free training of generative models and suggest practical guidance on bandwidth choices and sample regimes to ensure reliable optimization in practice.

Abstract

Generative models have been successfully used for generating realistic signals. Because the likelihood function is typically intractable in most of these models, the common practice is to use "implicit" models that avoid likelihood calculation. However, it is hard to obtain theoretical guarantees for such models. In particular, it is not understood when they can globally optimize their non-convex objectives. Here we provide such an analysis for the case of Maximum Mean Discrepancy (MMD) learning of generative models. We prove several optimality results, including for a Gaussian distribution with low rank covariance (where likelihood is inapplicable) and a mixture of Gaussians. Our analysis shows that that the MMD optimization landscape is benign in these cases, and therefore gradient based methods will globally minimize the MMD objective.

Paper Structure

This paper contains 21 sections, 8 theorems, 57 equations, 8 figures, 8 tables.

Key Result

Theorem 1

Let $P_\mu = \mathcal{N}(\mu,\Sigma)$ with an unknown $\mu^*$ parameter. The function $\text{MMD}(\mu^*, \mu)$ is given by: It is a quasi-convex function of $\mu$, and has a single stationary point at $\mu = \mu^*$, which is the global minimum.

Figures (8)

  • Figure 1: MMD loss of Gaussian over $\mathbb{R}$ with unknown mean. Different widths $\sigma^2$ and $\mu^*=0$.
  • Figure 2: MMD loss of Gaussian over $\mathbb{R}$ with unknown covariance, different widths $\sigma^2$ and $a^*=1$.
  • Figure 3: MMD loss of GMM over $\mathbb{R}$ with different widths $\sigma^2$ and $\mu^* = 25$. Note that in $\sigma^2 = 10^1$ around $\mu=\pm 8.5$ there are inflection points. The derivatives at these points are small but non-zero.
  • Figure 4: Success rate in a Gaussian model with unknown mean as a function of $m$. OSMMD and MLE coincide.
  • Figure 5: Success rate in a 2-GMM model with unknown mean as a function of $m$. OSMMD and MLE coincide.
  • ...and 3 more figures

Theorems & Definitions (19)

  • Definition 1: lee2016gradient
  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Theorem 3
  • proof
  • proof
  • proof
  • Lemma 4
  • ...and 9 more