Table of Contents
Fetching ...

Ensemble-localized Kernel Density Estimation with Applications to the Ensemble Gaussian Mixture Filter

Andrey A. Popov, Enrico M. Zucchelli, Renato Zanetti

TL;DR

This work addresses non-Gaussian state estimation by introducing E-localization for kernel density estimation (ELKDE) and applying it to the ensemble Gaussian mixture filter to form the ELEnGMF. ELKDE computes local covariance structures around each particle, recovering the canonical KDE behavior in the Gaussian case while better capturing local density in non-Gaussian settings, thereby improving both prior and posterior estimates. The authors demonstrate strong improvements on a non-Gaussian bivariate spiral and show reduced RMSE and less conservative uncertainty in Lorenz '63 sequential filtering, indicating practical gains for online data assimilation. The approach offers a scalable density-estimation enhancement for nonlinear, non-Gaussian systems with limited samples, with future work on adaptive localization and robust projection strategies.

Abstract

The ensemble Gaussian mixture filter (EnGMF) is a non-linear filter suited to data assimilation of highly non-Gaussian and non-linear models that has practical utility in the case of a small number of samples, and theoretical convergence to full Bayesian inference in the ensemble limit. We aim to increase the utility of the EnGMF by introducing an ensemble-local notion of covariance into the kernel density estimation (KDE) step for the prior distribution. We prove that in the Gaussian case, our new ensemble-localized KDE technique is exactly the same as more traditional KDE techniques. We also show an example of a non-Gaussian distribution that can fail to be approximated by canonical KDE methods, but can be approximated well by our new KDE technique. We showcase our new KDE technique on a simple bivariate problem, showing that it has nice qualitative and quantitative properties, and significantly improves the estimate of the prior and posterior distributions for all ensemble sizes tested. We additionally show the utility of the proposed methodology for sequential filtering for the Lorenz '63 equations, achieving a significant reduction in error, and less conservative behavior in the uncertainty estimate with respect to traditional techniques.

Ensemble-localized Kernel Density Estimation with Applications to the Ensemble Gaussian Mixture Filter

TL;DR

This work addresses non-Gaussian state estimation by introducing E-localization for kernel density estimation (ELKDE) and applying it to the ensemble Gaussian mixture filter to form the ELEnGMF. ELKDE computes local covariance structures around each particle, recovering the canonical KDE behavior in the Gaussian case while better capturing local density in non-Gaussian settings, thereby improving both prior and posterior estimates. The authors demonstrate strong improvements on a non-Gaussian bivariate spiral and show reduced RMSE and less conservative uncertainty in Lorenz '63 sequential filtering, indicating practical gains for online data assimilation. The approach offers a scalable density-estimation enhancement for nonlinear, non-Gaussian systems with limited samples, with future work on adaptive localization and robust projection strategies.

Abstract

The ensemble Gaussian mixture filter (EnGMF) is a non-linear filter suited to data assimilation of highly non-Gaussian and non-linear models that has practical utility in the case of a small number of samples, and theoretical convergence to full Bayesian inference in the ensemble limit. We aim to increase the utility of the EnGMF by introducing an ensemble-local notion of covariance into the kernel density estimation (KDE) step for the prior distribution. We prove that in the Gaussian case, our new ensemble-localized KDE technique is exactly the same as more traditional KDE techniques. We also show an example of a non-Gaussian distribution that can fail to be approximated by canonical KDE methods, but can be approximated well by our new KDE technique. We showcase our new KDE technique on a simple bivariate problem, showing that it has nice qualitative and quantitative properties, and significantly improves the estimate of the prior and posterior distributions for all ensemble sizes tested. We additionally show the utility of the proposed methodology for sequential filtering for the Lorenz '63 equations, achieving a significant reduction in error, and less conservative behavior in the uncertainty estimate with respect to traditional techniques.
Paper Structure (10 sections, 3 theorems, 60 equations, 5 figures, 1 algorithm)

This paper contains 10 sections, 3 theorems, 60 equations, 5 figures, 1 algorithm.

Key Result

Lemma 1.1

Without proof, if the distribution of interest, $p_X$ is Gaussian with mean $\mathfrak{m}$ and covariance $\mathfrak{S}$, and the kernel is Gaussian eq:Gaussian-kernel, then the bandwidth factor, minimizes the mean integral squared error, between the true distribution and its KDE given by eq:full-KDE-estimate.

Figures (5)

  • Figure 1: A demonstration of the local covariance technique \ref{['def:local-covariance']} on the bimodal example \ref{['sec:bimodal-example']}. The blue ellipses with solid lines represent the 1, 2, and 3-$\sigma$ bounds on the true bimodal distribution. The large red vertical ellipses with dotted outlines represent the global covariance. The yellow ellipses with dashed outlines represent the local covariances centered at hand-sampled points from the true distribution.
  • Figure 2: A qualitative look at the accuracy of the different KDE methods explored in this work for the continuous Gaussian mixture model described in \ref{['eq:continuous-GMM-example']}. The true distribution is shown on the left, the canonical KDE method is labeled by 'CKDE', the adaptive KDE method is labeled as 'AKDE', and the E-localized KDE method is labeled as 'ELKDE'.
  • Figure 3: A quantitative look at the accuracy of the different KDE methods explored in this work for the continuous Gaussian mixture model described in \ref{['eq:continuous-GMM-example']} of ensemble size versus error in terms of MISE. The canonical KDE method is labeled by 'CKDE', the adaptive KDE method is labeled as 'AKDE', and the E-localized KDE method is labeled as 'ELKDE'. Additionally, the approximation by the empirical Gaussian distribution is provided as a baseline.
  • Figure 4: A quantitative look at the performance of the EnGMF with CKDE prior estimation, the AEnGMF with AKDE prior estimation, and the ELEnGMF with ELKDE prior estimation for the Lorenz '63 equations. The $x$-axis represents the ensemble size $N$, and the $y$-axis represents mean spatio-temporal RMSE \ref{['eq:spatio-temporal-RMSE']} over $12$ separate Monte Carlo runs. The solid lines represent the mean of the error while the dashed lines represent three standard deviations of the error over the Monte Carlo simulations. The gray dotted line represents the theoretical minimum RMSE computed using a sequential importance resampling (SIR) filter with $N=25000$ ensemble members and optimal rejuvenation.
  • Figure 5: A quantitative look at the performance of the EnGMF with CKDE prior estimation, the AEnGMF with AKDE prior estimation, and the ELEnGMF with ELKDE prior estimation for the Lorenz '63 equations. The $x$-axis represents the ensemble size $N$, and the $y$-axis represents the mean SNEES \ref{['eq:SNEES']} over $12$ separate Monte Carlo runs. The solid lines represent the mean of the error while the dashed lines represent three standard deviations of the error over the Monte Carlo simulations. The gray dotted line represents the ideal SNEES of one.

Theorems & Definitions (13)

  • Lemma 1.1
  • Remark 1.1
  • Remark 1.2
  • Remark 1.3
  • Theorem 2.1: Approximate Gaussian covariance with E-localization
  • proof
  • Corollary 2.1.1: Local covariance becomes global
  • Remark 2.1: Generalized variance
  • Definition 2.1: Local covariance
  • Remark 2.2: Using $\mathfrak{C}(x_i, S_i)$
  • ...and 3 more