Table of Contents
Fetching ...

Two statistical problems for multivariate mixture distributions

Ricardo Fraiman, Leonardo Moreno, Thomas Ransford

TL;DR

This work advances projection-based inference for multivariate mixtures by combining univariate projected estimators with the empirical characteristic function to estimate Gaussian and $t$-mixture parameters, while establishing strong identifiability via strong sm-uniqueness sets. A two-step algorithm projects data along $k$ directions, estimates univariate mixture components, and reconstructs full $\mu_j$ and $\Sigma_j$ through least-squares and SDP, with strong consistency guaranteed under differentiability of the characteristic functions and a explicit projection-count requirement $k\ge \frac{1}{2}(2m-1)(d^2+d-2)+1$. The paper also develops a model-based, RP-driven method to compare random partitions by projecting data and using KS distances to measure distributional agreement, providing an asymptotic framework and demonstrating performance in simulations and a real-data Uruguay example. Collectively, these contributions offer a robust, scalable toolkit for high-dimensional mixture estimation and partition comparison, with practical guidance on projection count and algorithmic steps. The work has implications for clustering validation, robust mixture modeling, and high-dimensional inference where finite-projection identifiability can be exploited.

Abstract

We address two important statistical problems: that of estimating for mixtures of multivariate normal distributions and mixtures of $t$-distributions based of univariate projections, and that of measuring the agreement between two different random partitions. The results are based on an earlier work of the authors, where it was shown that mixtures of multivariate Gaussian or $t$-distributions can be distinguished by projecting them onto a certain predetermined finite set of lines, the number of lines depending only on the total number of distributions involved and on the ambient dimension. We also compare our proposal with robust versions of the expectation-maximization method EM. In each case, we present algorithms for effecting the task, and compare them with existing methods by carrying out some simulati

Two statistical problems for multivariate mixture distributions

TL;DR

This work advances projection-based inference for multivariate mixtures by combining univariate projected estimators with the empirical characteristic function to estimate Gaussian and -mixture parameters, while establishing strong identifiability via strong sm-uniqueness sets. A two-step algorithm projects data along directions, estimates univariate mixture components, and reconstructs full and through least-squares and SDP, with strong consistency guaranteed under differentiability of the characteristic functions and a explicit projection-count requirement . The paper also develops a model-based, RP-driven method to compare random partitions by projecting data and using KS distances to measure distributional agreement, providing an asymptotic framework and demonstrating performance in simulations and a real-data Uruguay example. Collectively, these contributions offer a robust, scalable toolkit for high-dimensional mixture estimation and partition comparison, with practical guidance on projection count and algorithmic steps. The work has implications for clustering validation, robust mixture modeling, and high-dimensional inference where finite-projection identifiability can be exploited.

Abstract

We address two important statistical problems: that of estimating for mixtures of multivariate normal distributions and mixtures of -distributions based of univariate projections, and that of measuring the agreement between two different random partitions. The results are based on an earlier work of the authors, where it was shown that mixtures of multivariate Gaussian or -distributions can be distinguished by projecting them onto a certain predetermined finite set of lines, the number of lines depending only on the total number of distributions involved and on the ambient dimension. We also compare our proposal with robust versions of the expectation-maximization method EM. In each case, we present algorithms for effecting the task, and compare them with existing methods by carrying out some simulati

Paper Structure

This paper contains 40 sections, 10 theorems, 57 equations, 14 figures, 4 tables, 2 algorithms.

Key Result

Theorem 1

Let $\aleph_n$ and $\epsilon$ be as in Theorem E:JL, and let $q>(4\log n)/(\epsilon^2/2-\epsilon^3/3)$. Define $f(u):= \sqrt{(d/q)}PUu$, where $U$ is a $d\times d$ unitary matrix whose columns are iid (independent and identically distributed) random vectors uniformly distributed on the unit sphere i

Figures (14)

  • Figure 1: Bivariate $t$-mixture samples for increasing separation $\eta$ (four panels: $\eta \in \{1/2,1,3/2,2\}$).
  • Figure 2: Mixing-weight error for $\lambda_1$: EM-st vs RP across separability scenarios $\eta \in \{1/2,1,3/2,2\}$ (boxplots over 100 replicates).
  • Figure 3: Comparison between the EM-st (EM) and RP methods via boxplots of estimation errors across the four separability scenarios $\eta \in \{1/2,1,3/2,2\}$. Errors are measured using the $L^2$-distance for the mean vectors, and the Frobenius distance for the variance--covariance matrices.
  • Figure 4: Simulated bivariate Student $t$-mixture with uniform contamination. Each panel shows a sample of size $N=500$ from a two-component $t_{\eta}$ mixture with $\eta=4$, mixing weights $(\lambda_1,\lambda_2)=(0.3,0.7)$, locations $\mu_1=(0,0)$ and $\mu_2=(2,0)$, and scatter matrices $\Sigma_1=\mathrm{diag}(1,1/2)$ and $\Sigma_2=\mathrm{diag}(1/2,1)$. A proportion $\gamma\in\{0.05,0.10,0.15\}$ of observations is replaced by outliers drawn uniformly from $[0,4]\times[0,4]$.
  • Figure 5: Comparison of estimation errors for the mixing proportion $\lambda_1$ under uniform contamination. Each panel corresponds to a different outlier proportion $\gamma \in \{0.05,0.10,0.15\}$ (left to right), and boxplots summarize the errors over $100$ Monte Carlo replicates for the RobEM and RP methods.
  • ...and 9 more figures

Theorems & Definitions (15)

  • Theorem 1: Dasgupta--Gupta DG2003
  • Theorem 2: Cuesta--Fraiman--Ransford CFR07
  • Theorem 3: Fraiman--Moreno--Ransford FMR23a
  • Theorem 4
  • Theorem 5
  • Definition 1: sm-uniqueness and strong sm-uniqueness sets FMR23aFMR2025
  • Theorem 6
  • Proposition 1: Uniqueness of covariance reconstruction
  • proof : Sketch of proof
  • Proposition 2
  • ...and 5 more