Table of Contents
Fetching ...

Robust Estimation in Finite Mixture Models

Alexandre Lecestre

TL;DR

This work tackles robust estimation in finite mixture models under misspecification and contamination by developing a non-asymptotic theory for $\rho$-estimators when emission densities lie in VC-subgraph classes. The authors derive exponential deviation bounds for the Hellinger risk, decompose it into approximation and complexity terms, and show robustness to model misspecification. They extend the framework to composite emission families and unknown component counts, delivering an oracle-type inequality that enables adaptive model selection and component-number determination. The approach is instantiated with Gaussian mixtures, establishing near-parametric rates and providing identifiability-assisted parameter recovery results, including in contamination-prone settings. Overall, the paper provides a principled, robust, and adaptable methodology for density estimation and parameter recovery in complex mixture models, with practical model-selection guarantees.

Abstract

We observe a $n$-sample, the distribution of which is assumed to belong, or at least to be close enough, to a given mixture model. We propose an estimator of this distribution that belongs to our model and possesses some robustness properties with respect to a possible misspecification of it. We establish a non-asymptotic deviation bound for the Hellinger distance between the target distribution and its estimator when the model consists of a mixture of densities that belong to VC-subgraph classes. Under suitable assumptions and when the mixture model is well-specified, we derive risk bounds for the parameters of the mixture. Finally, we design a statistical procedure that allows us to select from the data the number of components as well as suitable models for each of the densities that are involved in the mixture. These models are chosen among a collection of candidate ones and we show that our selection rule combined with our estimation strategy result in an estimator which satisfies an oracle-type inequality.

Robust Estimation in Finite Mixture Models

TL;DR

This work tackles robust estimation in finite mixture models under misspecification and contamination by developing a non-asymptotic theory for -estimators when emission densities lie in VC-subgraph classes. The authors derive exponential deviation bounds for the Hellinger risk, decompose it into approximation and complexity terms, and show robustness to model misspecification. They extend the framework to composite emission families and unknown component counts, delivering an oracle-type inequality that enables adaptive model selection and component-number determination. The approach is instantiated with Gaussian mixtures, establishing near-parametric rates and providing identifiability-assisted parameter recovery results, including in contamination-prone settings. Overall, the paper provides a principled, robust, and adaptable methodology for density estimation and parameter recovery in complex mixture models, with practical model-selection guarantees.

Abstract

We observe a -sample, the distribution of which is assumed to belong, or at least to be close enough, to a given mixture model. We propose an estimator of this distribution that belongs to our model and possesses some robustness properties with respect to a possible misspecification of it. We establish a non-asymptotic deviation bound for the Hellinger distance between the target distribution and its estimator when the model consists of a mixture of densities that belong to VC-subgraph classes. Under suitable assumptions and when the mixture model is well-specified, we derive risk bounds for the parameters of the mixture. Finally, we design a statistical procedure that allows us to select from the data the number of components as well as suitable models for each of the densities that are involved in the mixture. These models are chosen among a collection of candidate ones and we show that our selection rule combined with our estimation strategy result in an estimator which satisfies an oracle-type inequality.

Paper Structure

This paper contains 18 sections, 36 theorems, 206 equations.

Key Result

Theorem 1

Let $\delta\in(0,1/K]$ and $\xi>0$. Assume that Assumptions hyp:countable and hyp:vc_density_model hold and set $\overline{V}=V_1+\dots+V_K$. Any $\rho$-estimator $\hat{P}_{\delta}$ on $\mathscr{Q}_{K,\delta}$ satisfies with probability at least $1-e^{-\xi}$, where $c_0=300$, $c_1=8.8\times 10^5$ and $c_2=5014$. In particular, for the choice $\delta=1$ for $K=1$ and $\delta=\frac{\overline{V}}{n(

Theorems & Definitions (37)

  • Theorem 1
  • Theorem 2
  • Corollary 1
  • Proposition 1
  • Theorem 3
  • Example 1
  • Theorem 4
  • Theorem 5
  • Proposition 2
  • Theorem 6
  • ...and 27 more