Two statistical problems for multivariate mixture distributions
Ricardo Fraiman, Leonardo Moreno, Thomas Ransford
TL;DR
This work advances projection-based inference for multivariate mixtures by combining univariate projected estimators with the empirical characteristic function to estimate Gaussian and $t$-mixture parameters, while establishing strong identifiability via strong sm-uniqueness sets. A two-step algorithm projects data along $k$ directions, estimates univariate mixture components, and reconstructs full $\mu_j$ and $\Sigma_j$ through least-squares and SDP, with strong consistency guaranteed under differentiability of the characteristic functions and a explicit projection-count requirement $k\ge \frac{1}{2}(2m-1)(d^2+d-2)+1$. The paper also develops a model-based, RP-driven method to compare random partitions by projecting data and using KS distances to measure distributional agreement, providing an asymptotic framework and demonstrating performance in simulations and a real-data Uruguay example. Collectively, these contributions offer a robust, scalable toolkit for high-dimensional mixture estimation and partition comparison, with practical guidance on projection count and algorithmic steps. The work has implications for clustering validation, robust mixture modeling, and high-dimensional inference where finite-projection identifiability can be exploited.
Abstract
We address two important statistical problems: that of estimating for mixtures of multivariate normal distributions and mixtures of $t$-distributions based of univariate projections, and that of measuring the agreement between two different random partitions. The results are based on an earlier work of the authors, where it was shown that mixtures of multivariate Gaussian or $t$-distributions can be distinguished by projecting them onto a certain predetermined finite set of lines, the number of lines depending only on the total number of distributions involved and on the ambient dimension. We also compare our proposal with robust versions of the expectation-maximization method EM. In each case, we present algorithms for effecting the task, and compare them with existing methods by carrying out some simulati
