Gaussian Mixture Estimation from Weighted Samples
Daniel Frisch, Uwe D. Hanebeck
TL;DR
The paper addresses estimating Gaussian mixture parameters from weighted samples by reframing the problem as density re-approximation and introducing a fast EM method that uses sample locations to form hidden associations while applying weights only in the M-step. The key innovation is that the E-step operates independent of sample weights, and the M-step updates incorporate weights via weighted sums, yielding correct parameter estimates and an invariance to split-sample representations. The authors show that a prominent prior method (Gebru16) misuses weights, leading to inaccurate means, covariances, and even noninvariant results under sample splitting; in contrast, the proposed approach achieves accurate estimates, handles arbitrary dimensionality, and remains compatible with standard unweighted GM estimation. This work broadens the applicability of GM estimation to weighted data, with practical impact for clustering, tracking, and Bayesian density representation, while presenting a straightforward plug-in improvement for existing implementations.
Abstract
We consider estimating the parameters of a Gaussian mixture density with a given number of components best representing a given set of weighted samples. We adopt a density interpretation of the samples by viewing them as a discrete Dirac mixture density over a continuous domain with weighted components. Hence, Gaussian mixture fitting is viewed as density re-approximation. In order to speed up computation, an expectation-maximization method is proposed that properly considers not only the sample locations, but also the corresponding weights. It is shown that methods from literature do not treat the weights correctly, resulting in wrong estimates. This is demonstrated with simple counterexamples. The proposed method works in any number of dimensions with the same computational load as standard Gaussian mixture estimators for unweighted samples.
