Table of Contents
Fetching ...

Finite sample properties of parametric MMD estimation: robustness to misspecification and dependence

Badr-Eddine Chérief-Abdellatif, Pierre Alquier

TL;DR

This paper tackles the problem of universal estimation using a minimum distance estimator presented in Briol et al. (2019) based on the Maximum Mean Discrepancy, and shows that the estimator is robust to both dependence and to the presence of outliers in the dataset.

Abstract

Many works in statistics aim at designing a universal estimation procedure, that is, an estimator that would converge to the best approximation of the (unknown) data generating distribution in a model, without any assumption on this distribution. This question is of major interest, in particular because the universality property leads to the robustness of the estimator. In this paper, we tackle the problem of universal estimation using a minimum distance estimator presented in Briol et al. (2019) based on the Maximum Mean Discrepancy. We show that the estimator is robust to both dependence and to the presence of outliers in the dataset. Finally, we provide a theoretical study of the stochastic gradient descent algorithm used to compute the estimator, and we support our findings with numerical simulations. ** The proof of Proposition 4.4 in the published version contains a mistake. The mistake is fixed here (and the bound is actually improved by a factor 2). **

Finite sample properties of parametric MMD estimation: robustness to misspecification and dependence

TL;DR

This paper tackles the problem of universal estimation using a minimum distance estimator presented in Briol et al. (2019) based on the Maximum Mean Discrepancy, and shows that the estimator is robust to both dependence and to the presence of outliers in the dataset.

Abstract

Many works in statistics aim at designing a universal estimation procedure, that is, an estimator that would converge to the best approximation of the (unknown) data generating distribution in a model, without any assumption on this distribution. This question is of major interest, in particular because the universality property leads to the robustness of the estimator. In this paper, we tackle the problem of universal estimation using a minimum distance estimator presented in Briol et al. (2019) based on the Maximum Mean Discrepancy. We show that the estimator is robust to both dependence and to the presence of outliers in the dataset. Finally, we provide a theoretical study of the stochastic gradient descent algorithm used to compute the estimator, and we support our findings with numerical simulations. ** The proof of Proposition 4.4 in the published version contains a mistake. The mistake is fixed here (and the bound is actually improved by a factor 2). **

Paper Structure

This paper contains 39 sections, 16 theorems, 134 equations, 6 figures, 2 tables, 1 algorithm.

Key Result

Theorem 3.1

We have:

Figures (6)

  • Figure 1: Illustration of the behaviour of the MMD estimator in the high-dimensional Gaussian mean estimation problem. The true parameter $\theta_0$ and datapoints sampled from the true distribution $\mathcal{N}(\theta_0,I_d)$ are colored in blue. Outliers and the MMD estimator $\hat{\theta}_n$ are colored in red. We can see that outliers lying at a distance $\sqrt{d}$ are not detected and shift the mean by $\varepsilon\sqrt{d}$.
  • Figure 2: Mean square error as a function of the outliers ratio $\varepsilon$, for a dimension $d=10$, a sample size $n=5000$, and a Gaussian contamination $Q=\mathcal{N}(\mathbf{5},I_d)$. The error grows linearly as the ratio increases.
  • Figure 3: Mean square error as a function of the square root of the dimension $\sqrt{d}$, for an outlier ratio $\varepsilon=0.1$, a sample size $n=5000$, and two different contaminations: a "harmless" Gaussian $Q=\mathcal{N}(\mathbf{5},I_d)$ and a "worst-case" Dirac $Q=\delta_{\{\mathbf{1}\}}$. The error grows linearly in the Dirac case but is not affected by the dimension in the Gaussian case.
  • Figure 4: Plot of the estimated densities using different methods without outliers. The blue curve represents the true density, the red one the MMD density, the green one the CAVI density and the black one the EM density.
  • Figure 5: Plot of the estimated densities using different methods in presence of 1 outlier at 100. The blue curve represents the true density, the red one the MMD density, the green one the CAVI density and the black one the EM density. The EM estimate has a small component at 100, and CAVI only one component at 100.
  • ...and 1 more figures

Theorems & Definitions (37)

  • Definition 2.1
  • Theorem 3.1
  • Theorem 3.2
  • Remark 3.1: The i.i.d case
  • Remark 3.2: Connection between the MMD distance and the $L^2$ norm
  • Lemma 3.3
  • Corollary 3.4
  • Proposition 3.5
  • Proposition 4.1
  • Proposition 4.2
  • ...and 27 more