A Test of Relative Similarity For Model Selection in Generative Models
Wacha Bounliphone, Eugene Belilovsky, Matthew B. Blaschko, Ioannis Antonoglou, Arthur Gretton
TL;DR
This paper addresses model selection for deep generative networks when likelihoods are difficult to compute by introducing a non-parametric relative similarity test based on Maximum Mean Discrepancy. It derives the joint asymptotic distribution of two correlated MMD estimators to perform a significance test that decides which candidate better matches a reference dataset, using a π/4 rotation to obtain a one-dimensional p-value. The authors validate the method on synthetic data and apply it to Variational Auto-Encoders and Generative Moment Matching Networks, showing that the test’s rankings align with traditional metrics while providing statistical guarantees. The approach offers a principled, scalable tool for architecture and training regime selection in unsupervised deep learning, with practical guidance on kernel choice and bandwidth and publicly available code.
Abstract
Probabilistic generative models provide a powerful framework for representing data that avoids the expense of manual annotation typically needed by discriminative approaches. Model selection in this generative setting can be challenging, however, particularly when likelihoods are not easily accessible. To address this issue, we introduce a statistical test of relative similarity, which is used to determine which of two models generates samples that are significantly closer to a real-world reference dataset of interest. We use as our test statistic the difference in maximum mean discrepancies (MMDs) between the reference dataset and each model dataset, and derive a powerful, low-variance test based on the joint asymptotic distribution of the MMDs between each reference-model pair. In experiments on deep generative models, including the variational auto-encoder and generative moment matching network, the tests provide a meaningful ranking of model performance as a function of parameter and training settings.
