Table of Contents
Fetching ...

Inference on Gaussian mixture models with dependent labels

Seunghyun Lee, Rajarshi Mukherjee, Sumit Mukherjee

TL;DR

This work analyzes estimation in Gaussian mixture models with latent, potentially dependent labels. It establishes a universal $\sqrt{n}$-rate estimator that remains efficient under broad dependence by using a misspecified iid likelihood, and it characterizes information-theoretic limits when latent labels follow an Ising model, revealing a phase transition at $\beta=1$. In weak dependence ($\beta\le 1$) the iid-based estimator is optimal, while strong dependence ($\beta>1$) calls for a mean-field variational estimator $\hat{\boldsymbol{\theta}}^{\text{MF}}_n$ that achieves a smaller asymptotic variance $I_{\beta}(\boldsymbol{\theta}_0)^{-1}$ (with a CW-lattice basis). The paper also discusses unknown dependence strength, partial remedies, and connections to Hidden Markov Random Fields (HMRFs), offering directions for future study on more components, high dimensions, and non-mean-field graphs. Overall, it provides sharp asymptotic efficiency results and practical estimators for dependent-labeled Gaussian mixtures, clarifying when dependence helps or hinders inference.

Abstract

Gaussian mixture models are widely used to model data generated from multiple latent sources. Despite its popularity, most theoretical research assumes that the labels are either independent and identically distributed, or follows a Markov chain. It remains unclear how the fundamental limits of estimation change under more complex dependence. In this paper, we address this question for the spherical two-component Gaussian mixture model. We first show that for labels with an arbitrary dependence, a naive estimator based on the misspecified likelihood is $\sqrt{n}$-consistent. Additionally, under labels that follow an Ising model, we establish the information theoretic limitations for estimation, and discover an interesting phase transition as dependence becomes stronger. When the dependence is smaller than a threshold, the optimal estimator and its limiting variance exactly matches the independent case, for a wide class of Ising models. On the other hand, under stronger dependence, estimation becomes easier and the naive estimator is no longer optimal. Hence, we propose an alternative estimator based on the variational approximation of the likelihood, and argue its optimality under a specific Ising model.

Inference on Gaussian mixture models with dependent labels

TL;DR

This work analyzes estimation in Gaussian mixture models with latent, potentially dependent labels. It establishes a universal -rate estimator that remains efficient under broad dependence by using a misspecified iid likelihood, and it characterizes information-theoretic limits when latent labels follow an Ising model, revealing a phase transition at . In weak dependence () the iid-based estimator is optimal, while strong dependence () calls for a mean-field variational estimator that achieves a smaller asymptotic variance (with a CW-lattice basis). The paper also discusses unknown dependence strength, partial remedies, and connections to Hidden Markov Random Fields (HMRFs), offering directions for future study on more components, high dimensions, and non-mean-field graphs. Overall, it provides sharp asymptotic efficiency results and practical estimators for dependent-labeled Gaussian mixtures, clarifying when dependence helps or hinders inference.

Abstract

Gaussian mixture models are widely used to model data generated from multiple latent sources. Despite its popularity, most theoretical research assumes that the labels are either independent and identically distributed, or follows a Markov chain. It remains unclear how the fundamental limits of estimation change under more complex dependence. In this paper, we address this question for the spherical two-component Gaussian mixture model. We first show that for labels with an arbitrary dependence, a naive estimator based on the misspecified likelihood is -consistent. Additionally, under labels that follow an Ising model, we establish the information theoretic limitations for estimation, and discover an interesting phase transition as dependence becomes stronger. When the dependence is smaller than a threshold, the optimal estimator and its limiting variance exactly matches the independent case, for a wide class of Ising models. On the other hand, under stronger dependence, estimation becomes easier and the naive estimator is no longer optimal. Hence, we propose an alternative estimator based on the variational approximation of the likelihood, and argue its optimality under a specific Ising model.

Paper Structure

This paper contains 31 sections, 24 theorems, 184 equations, 3 figures, 1 table.

Key Result

Lemma 2.1

$N_{\infty}:\Theta_1 \to \mathbb R$ is differentiable in $\textnormal{int}(\Theta_1)$ and uniquely minimized at $\boldsymbol{\theta}= \boldsymbol{\theta}_0$. Furthermore, $\boldsymbol{\theta}_0$ is the unique solution of $(\nabla N_{\infty})(\boldsymbol{\theta}) = \mathbf{0}_d$ in $\textnormal{int}(

Figures (3)

  • Figure 1: Plot of the (scaled) optimal limiting variance with respect to the dependence parameter $\beta \in [0,2]$, under Curie-Weiss labels. The hardness of estimation changes at $\beta = 1$, regardless of the true parameter $\boldsymbol{\theta}_0$. note that the scale of the $y$-axis is different for each panel.
  • Figure 2: Scaled limiting variance of the estimators; "IID" denotes $\hat{\boldsymbol \theta}^{\text{iid}}_{\mathnormal n}$ and "MLE" denotes $\hat{\boldsymbol{\theta}}_n^{\text{MLE}}$. For all $\beta>0$ and $\boldsymbol{\theta}_0$, $\hat{\boldsymbol{\theta}}_n^{\text{MLE}}$ is always more efficient compared to $\hat{\boldsymbol \theta}^{\text{iid}}_{\mathnormal n}$.
  • Figure 3: Scaled limiting variance of the estimators considered in this paper; "IID" denotes $\hat{\boldsymbol \theta}^{\text{iid}}_{\mathnormal n}$, "MF" denotes $\hat{\boldsymbol \theta}^{\text{MF}}_{\mathnormal n}$, and "aMLE" denotes $\hat{\boldsymbol{\theta}}_n^{\text{aMLE}}$. For both $\beta = 1.1$ and $1.5$, we see that $\hat{\boldsymbol \theta}^{\text{MF}}_{\mathnormal n}$ has the smallest variance.

Theorems & Definitions (61)

  • Lemma 2.1
  • Theorem 2.2
  • Remark 2.1: Computing the estimator
  • Definition 2.1: Ising model
  • Example 2.1: Curie-Weiss model
  • Lemma 2.3
  • Remark 2.2
  • Theorem 2.4
  • Corollary 2.5
  • Remark 2.3: Testing against contiguous alternatives
  • ...and 51 more