Table of Contents
Fetching ...

Divergence-Minimization for Latent-Structure Models: Monotone Operators, Contraction Guarantees, and Robust Inference

Lei Li, Anand N. Vidyashankar

TL;DR

This work develops a divergence-minimization (DM) framework for latent-structure inference that unifies EM with robust minimum-disparity methods across finite mixtures and HFMMs. It establishes monotone descent of the DM objective, local contraction of the population operator, and sqrt{n}-consistent asymptotics for truncated iterates, while explicitly characterizing robustness via the residual-adjustment function and breakdown analysis. The authors introduce a split-sample GDIC approach to select the number of components with valid post-selection inference, and demonstrate, both theoretically and empirically, that DM variants based on Hellinger and negative-exponential divergences deliver robust performance under contamination and outliers, maintaining competitive accuracy in standard settings. The framework connects to MM and proximal-point paradigms, provides practical defaults, and yields a drop-in robust alternative to EM for latent-structure inference with provable guarantees and scalable model-selection machinery.

Abstract

We develop a divergence-minimization (DM) framework for robust and efficient inference in latent-mixture models. By optimizing a residual-adjusted divergence, the DM approach recovers EM as a special case and yields robust alternatives through different divergence choices. We establish that the sample objective decreases monotonically along the iterates, leading the DM sequence to stationary points under standard conditions, and that at the population level the operator exhibits local contractivity near the minimizer. Additionally, we verify consistency and $\sqrt{n}$-asymptotic normality of minimum-divergence estimators and of finitely many DM iterations, showing that under correct specification their limiting covariance matches the Fisher information. Robustness is analyzed via the residual-adjustment function, yielding bounded influence functions and a strictly positive breakdown bound for bounded-RAF divergences, and we contrast this with the non-robust behaviour of KL/EM. Next, we address the challenge of determining the number of mixture components by proposing a penalized divergence criterion combined with repeated sample splitting, which delivers consistent order selection and valid post-selection inference. Empirically, DM instantiations based on Hellinger and negative exponential divergences deliver accurate inference and remain stable under contamination in mixture and image-segmentation tasks. The results clarify connections to MM and proximal-point methods and offer practical defaults, making DM a drop-in alternative to EM for robust latent-structure inference.

Divergence-Minimization for Latent-Structure Models: Monotone Operators, Contraction Guarantees, and Robust Inference

TL;DR

This work develops a divergence-minimization (DM) framework for latent-structure inference that unifies EM with robust minimum-disparity methods across finite mixtures and HFMMs. It establishes monotone descent of the DM objective, local contraction of the population operator, and sqrt{n}-consistent asymptotics for truncated iterates, while explicitly characterizing robustness via the residual-adjustment function and breakdown analysis. The authors introduce a split-sample GDIC approach to select the number of components with valid post-selection inference, and demonstrate, both theoretically and empirically, that DM variants based on Hellinger and negative-exponential divergences deliver robust performance under contamination and outliers, maintaining competitive accuracy in standard settings. The framework connects to MM and proximal-point paradigms, provides practical defaults, and yields a drop-in robust alternative to EM for latent-structure inference with provable guarantees and scalable model-selection machinery.

Abstract

We develop a divergence-minimization (DM) framework for robust and efficient inference in latent-mixture models. By optimizing a residual-adjusted divergence, the DM approach recovers EM as a special case and yields robust alternatives through different divergence choices. We establish that the sample objective decreases monotonically along the iterates, leading the DM sequence to stationary points under standard conditions, and that at the population level the operator exhibits local contractivity near the minimizer. Additionally, we verify consistency and -asymptotic normality of minimum-divergence estimators and of finitely many DM iterations, showing that under correct specification their limiting covariance matches the Fisher information. Robustness is analyzed via the residual-adjustment function, yielding bounded influence functions and a strictly positive breakdown bound for bounded-RAF divergences, and we contrast this with the non-robust behaviour of KL/EM. Next, we address the challenge of determining the number of mixture components by proposing a penalized divergence criterion combined with repeated sample splitting, which delivers consistent order selection and valid post-selection inference. Empirically, DM instantiations based on Hellinger and negative exponential divergences deliver accurate inference and remain stable under contamination in mixture and image-segmentation tasks. The results clarify connections to MM and proximal-point methods and offer practical defaults, making DM a drop-in alternative to EM for robust latent-structure inference.

Paper Structure

This paper contains 22 sections, 46 theorems, 219 equations, 14 figures, 9 tables, 5 algorithms.

Key Result

Lemma 1

For any $\bm{\theta}'$,

Figures (14)

  • Figure 1: The DM Algorithm Illustration
  • Figure 2: Average parameter estimates versus contamination level $\epsilon$ in a two–component PG mixture with known $K=2$.
  • Figure 3: Lena image reconstruction after adding $30\%$ outliers.
  • Figure 4: Plot of Residual Adjustment Function $A(\delta)$ for LD, HD, NED, and vNED
  • Figure 5: Average estimates of $\pi_1$ across kernels for sample sizes $n=20,50,100,200$. Solid lines represent EM, HD, and vNED methods; dashed horizontal line indicates the true value $0.4$.
  • ...and 9 more figures

Theorems & Definitions (89)

  • Lemma 1: Variational elimination of the auxiliary conditional
  • proof
  • Lemma 2
  • Definition 1: DM sequences
  • Lemma 3: Monotone descent for DM updates
  • proof
  • Proposition 1
  • proof : Proof (majorization--minimization)
  • Remark 1: When “stationary” coincides with “fixed point”
  • Proposition 2
  • ...and 79 more