Table of Contents
Fetching ...

Gaussian Mixture Estimation from Weighted Samples

Daniel Frisch, Uwe D. Hanebeck

TL;DR

The paper addresses estimating Gaussian mixture parameters from weighted samples by reframing the problem as density re-approximation and introducing a fast EM method that uses sample locations to form hidden associations while applying weights only in the M-step. The key innovation is that the E-step operates independent of sample weights, and the M-step updates incorporate weights via weighted sums, yielding correct parameter estimates and an invariance to split-sample representations. The authors show that a prominent prior method (Gebru16) misuses weights, leading to inaccurate means, covariances, and even noninvariant results under sample splitting; in contrast, the proposed approach achieves accurate estimates, handles arbitrary dimensionality, and remains compatible with standard unweighted GM estimation. This work broadens the applicability of GM estimation to weighted data, with practical impact for clustering, tracking, and Bayesian density representation, while presenting a straightforward plug-in improvement for existing implementations.

Abstract

We consider estimating the parameters of a Gaussian mixture density with a given number of components best representing a given set of weighted samples. We adopt a density interpretation of the samples by viewing them as a discrete Dirac mixture density over a continuous domain with weighted components. Hence, Gaussian mixture fitting is viewed as density re-approximation. In order to speed up computation, an expectation-maximization method is proposed that properly considers not only the sample locations, but also the corresponding weights. It is shown that methods from literature do not treat the weights correctly, resulting in wrong estimates. This is demonstrated with simple counterexamples. The proposed method works in any number of dimensions with the same computational load as standard Gaussian mixture estimators for unweighted samples.

Gaussian Mixture Estimation from Weighted Samples

TL;DR

The paper addresses estimating Gaussian mixture parameters from weighted samples by reframing the problem as density re-approximation and introducing a fast EM method that uses sample locations to form hidden associations while applying weights only in the M-step. The key innovation is that the E-step operates independent of sample weights, and the M-step updates incorporate weights via weighted sums, yielding correct parameter estimates and an invariance to split-sample representations. The authors show that a prominent prior method (Gebru16) misuses weights, leading to inaccurate means, covariances, and even noninvariant results under sample splitting; in contrast, the proposed approach achieves accurate estimates, handles arbitrary dimensionality, and remains compatible with standard unweighted GM estimation. This work broadens the applicability of GM estimation to weighted data, with practical impact for clustering, tracking, and Bayesian density representation, while presenting a straightforward plug-in improvement for existing implementations.

Abstract

We consider estimating the parameters of a Gaussian mixture density with a given number of components best representing a given set of weighted samples. We adopt a density interpretation of the samples by viewing them as a discrete Dirac mixture density over a continuous domain with weighted components. Hence, Gaussian mixture fitting is viewed as density re-approximation. In order to speed up computation, an expectation-maximization method is proposed that properly considers not only the sample locations, but also the corresponding weights. It is shown that methods from literature do not treat the weights correctly, resulting in wrong estimates. This is demonstrated with simple counterexamples. The proposed method works in any number of dimensions with the same computational load as standard Gaussian mixture estimators for unweighted samples.

Paper Structure

This paper contains 16 sections, 13 equations, 2 figures.

Figures (2)

  • Figure 1: Two-dimensional GM parameter estimation using EM from Gebru16 (blue line), and EM according to our proposed method (red line). Compare the ground truth (black line). Equidistant samples (grey dots) were weighted with the GM density function and given to the EM algorithms.
  • Figure 2: A simple scalar example with two GM components. Equidistant samples were weighted with the ground truth probability density function, and the GM parameters (component weights, means, and variances) were estimated with our proposed method (red lines) and the method from Gebru16 (blue lines). Ideally, the estimations should converge to the ground truth (black dashed lines) after some iterations.