Table of Contents
Fetching ...

GROS: A General Robust Aggregation Strategy

Alejandro Cholaquidis, Emilien Joly, Leonardo Moreno

Abstract

A new, very general, robust procedure for combining estimators in metric spaces is introduced GROS. The method is reminiscent of the well-known median of means, as described in \cite{devroye2016sub}. Initially, the sample is divided into $K$ groups. Subsequently, an estimator is computed for each group. Finally, these $K$ estimators are combined using a robust procedure. We prove that this estimator is sub-Gaussian and we get its break-down point, in the sense of Donoho. The robust procedure involves a minimization problem on a general metric space, but we show that the same (up to a constant) sub-Gaussianity is obtained if the minimization is taken over the sample, making GROS feasible in practice. The performance of GROS is evaluated through five simulation studies: the first one focuses on classification using $k$-means, the second one on the multi-armed bandit problem, the third one on the regression problem. The fourth one is the set estimation problem under a noisy model. Lastly, we apply GROS to get a robust persistent diagram.

GROS: A General Robust Aggregation Strategy

Abstract

A new, very general, robust procedure for combining estimators in metric spaces is introduced GROS. The method is reminiscent of the well-known median of means, as described in \cite{devroye2016sub}. Initially, the sample is divided into groups. Subsequently, an estimator is computed for each group. Finally, these estimators are combined using a robust procedure. We prove that this estimator is sub-Gaussian and we get its break-down point, in the sense of Donoho. The robust procedure involves a minimization problem on a general metric space, but we show that the same (up to a constant) sub-Gaussianity is obtained if the minimization is taken over the sample, making GROS feasible in practice. The performance of GROS is evaluated through five simulation studies: the first one focuses on classification using -means, the second one on the multi-armed bandit problem, the third one on the regression problem. The fourth one is the set estimation problem under a noisy model. Lastly, we apply GROS to get a robust persistent diagram.
Paper Structure (17 sections, 6 theorems, 34 equations, 10 figures)

This paper contains 17 sections, 6 theorems, 34 equations, 10 figures.

Key Result

Lemma 1

Assume that there exist an $\eta \in \mathcal{M}$ and an $I\subset [K]$ with $|I|>K/2$ such that for all $j\in I$, $d(\mu_j,\eta)\le t$. Then, $d(\mu^*,\eta)\le 2t$.

Figures (10)

  • Figure 1: Simulation of 1000 observations of the multivariate Student mixture \ref{['student']}. Observations are colored according to the component of the mixture which the data comes from.
  • Figure 2: Box plot of classification errors, according to \ref{['error']}, of $K$-means, TClust, PAM and RobustKM over $1000$ replicates.
  • Figure 3: Cumulative gains over 500 replications, for $t=1, \ldots,750$. The red dotted horizontal line ($y=8$) is the maximum expected gain. The black dotted vertical line ($x=40$) indicates the number of random warm-up runs in the RUCB algorithm. The dashed lines depict the mean reward of the UCB (orange) and RUCB (blue) algorithms.
  • Figure 4: Box plot of classification errors (according to L2 distance) in $1000$ replicates. The different scenarios are obtained in the skew-normal Student distribution with $\sigma \in \{ 9,16\}$ and $\xi \in \{ 1,9\}$, fixed $\nu=3$ and $\kappa=0$.
  • Figure 5: Regression functions estimated with the RANW (orange), NW (black), ONL (light blue) and SBMB (blue) in one replicate. The true function is shown in red. The different scenarios are obtained in the skew-normal Student distribution with $\sigma \in \{ 9,16\}$ and $\xi \in \{ 1,9\}$, fixed $\mu=0$ and $\nu=3$.
  • ...and 5 more figures

Theorems & Definitions (13)

  • Remark
  • Definition 1
  • Lemma 1
  • proof
  • Theorem 2
  • Remark
  • Lemma 3
  • proof
  • Corollary 2.1
  • Lemma 4
  • ...and 3 more