On the Optimization Landscape of Maximum Mean Discrepancy
Itai Alon, Amir Globerson, Ami Wiesel
TL;DR
The paper addresses the theoretical gap for implicit generative models by analyzing the optimization landscape of Maximum Mean Discrepancy (MMD) learning in three Gaussian settings: unknown mean, low-rank covariance, and a symmetric two-Gaussian mixture. It shows that in each setting, the MMD objective has no spurious local minima and all non-global stationary points are strict saddles, enabling gradient-based methods to converge to the global optimum; these results are complemented by formal proofs and empirical validation. The experiments compare MMD, OS-MMD, MLE, and WGAN, demonstrating that MMD-based methods either match or outperform likelihood-based or adversarial baselines, particularly in ill-conditioned or near-singular covariance scenarios. The findings provide theoretical guarantees for likelihood-free training of generative models and suggest practical guidance on bandwidth choices and sample regimes to ensure reliable optimization in practice.
Abstract
Generative models have been successfully used for generating realistic signals. Because the likelihood function is typically intractable in most of these models, the common practice is to use "implicit" models that avoid likelihood calculation. However, it is hard to obtain theoretical guarantees for such models. In particular, it is not understood when they can globally optimize their non-convex objectives. Here we provide such an analysis for the case of Maximum Mean Discrepancy (MMD) learning of generative models. We prove several optimality results, including for a Gaussian distribution with low rank covariance (where likelihood is inapplicable) and a mixture of Gaussians. Our analysis shows that that the MMD optimization landscape is benign in these cases, and therefore gradient based methods will globally minimize the MMD objective.
