Table of Contents
Fetching ...

Learning Enhanced Ensemble Filters

Eviatar Bach, Ricardo Baptista, Edoardo Calvello, Bohan Chen, Andrew Stuart

TL;DR

The paper addresses the data-assimilation challenge of accurately inferring high-dimensional states under partial, noisy observations. It introduces measure neural mappings (MNM) and a transformer-based MNMEF framework that learn analysis maps acting on probability measures, enabling parameter sharing across ensemble sizes via a mean-field perspective. The approach extends EnKF by incorporating trainable corrections (gain, inflation, localization) learned through a set-transformer architecture, achieving superior performance on Lorenz96, Kuramoto–Sivashinsky, and Lorenz63 across ensemble sizes and demonstrating efficient fine-tuning. The work offers a principled neural-operator strategy for amortized data assimilation, with potential broad impact for scalable, robust filtering in nonlinear, high-dimensional systems.

Abstract

The filtering distribution in hidden Markov models evolves according to the law of a mean-field model in state-observation space. The ensemble Kalman filter (EnKF) approximates this mean-field model with an ensemble of interacting particles, employing a Gaussian ansatz for the joint distribution of the state and observation at each observation time. These methods are robust, but the Gaussian ansatz limits accuracy. Here this shortcoming is addressed by using machine learning to map the joint predicted state and observation to the updated state estimate. The derivation of methods from a mean field formulation of the true filtering distribution suggests a single parametrization of the algorithm that can be deployed at different ensemble sizes. And we use a mean field formulation of the ensemble Kalman filter as an inductive bias for our architecture. To develop this perspective, in which the mean-field limit of the algorithm and finite interacting ensemble particle approximations share a common set of parameters, a novel form of neural operator is introduced, taking probability distributions as input: a measure neural mapping (MNM). A MNM is used to design a novel approach to filtering, the MNM-enhanced ensemble filter (MNMEF), which is defined in both the mean-field limit and for interacting ensemble particle approximations. The ensemble approach uses empirical measures as input to the MNM and is implemented using the set transformer, which is invariant to ensemble permutation and allows for different ensemble sizes. In practice fine-tuning of a small number of parameters, for specific ensemble sizes, further enhances the accuracy of the scheme. The promise of the approach is demonstrated by its superior root-mean-square-error performance relative to leading methods in filtering the Lorenz '96 and Kuramoto-Sivashinsky models.

Learning Enhanced Ensemble Filters

TL;DR

The paper addresses the data-assimilation challenge of accurately inferring high-dimensional states under partial, noisy observations. It introduces measure neural mappings (MNM) and a transformer-based MNMEF framework that learn analysis maps acting on probability measures, enabling parameter sharing across ensemble sizes via a mean-field perspective. The approach extends EnKF by incorporating trainable corrections (gain, inflation, localization) learned through a set-transformer architecture, achieving superior performance on Lorenz96, Kuramoto–Sivashinsky, and Lorenz63 across ensemble sizes and demonstrating efficient fine-tuning. The work offers a principled neural-operator strategy for amortized data assimilation, with potential broad impact for scalable, robust filtering in nonlinear, high-dimensional systems.

Abstract

The filtering distribution in hidden Markov models evolves according to the law of a mean-field model in state-observation space. The ensemble Kalman filter (EnKF) approximates this mean-field model with an ensemble of interacting particles, employing a Gaussian ansatz for the joint distribution of the state and observation at each observation time. These methods are robust, but the Gaussian ansatz limits accuracy. Here this shortcoming is addressed by using machine learning to map the joint predicted state and observation to the updated state estimate. The derivation of methods from a mean field formulation of the true filtering distribution suggests a single parametrization of the algorithm that can be deployed at different ensemble sizes. And we use a mean field formulation of the ensemble Kalman filter as an inductive bias for our architecture. To develop this perspective, in which the mean-field limit of the algorithm and finite interacting ensemble particle approximations share a common set of parameters, a novel form of neural operator is introduced, taking probability distributions as input: a measure neural mapping (MNM). A MNM is used to design a novel approach to filtering, the MNM-enhanced ensemble filter (MNMEF), which is defined in both the mean-field limit and for interacting ensemble particle approximations. The ensemble approach uses empirical measures as input to the MNM and is implemented using the set transformer, which is invariant to ensemble permutation and allows for different ensemble sizes. In practice fine-tuning of a small number of parameters, for specific ensemble sizes, further enhances the accuracy of the scheme. The promise of the approach is demonstrated by its superior root-mean-square-error performance relative to leading methods in filtering the Lorenz '96 and Kuramoto-Sivashinsky models.

Paper Structure

This paper contains 43 sections, 4 theorems, 118 equations, 13 figures, 8 tables, 3 algorithms.

Key Result

Lemma 1

Let $Q, K, V$ be fixed matrices of appropriate dimensions. The measure $p(du;s)$ is a well-defined probability measure for $\nu\in\mathcal{P}_{\delta}(\mathbb{R}^{d_u})$. The measure $p(dw;u,\nu)$ is a well-defined probability measure for $\nu\in\mathcal{P}_{\delta}(\mathbb{R}^{d_w})$.

Figures (13)

  • Figure 1: Comparison results on the Lorenz '96 system. The upper row of plots shows direct R-RMSE comparisons between different methods, while the lower row illustrates the relative improvement of our fine-tuning method compared to benchmarks. These comparisons are presented for observation noise levels $\sigma_y=0.7$ (left column) and $\sigma_y=1.0$ (right column). Our proposed MNMEF method is highlighted in the legends with bold font and an asterisk (e.g. Pretrain$^*$). In summary, our fine-tuned MNMEF model consistently outperforms benchmarks, showing substantial improvements for small ensembles and a 15-20% advantage over LETKF at larger ensemble sizes (eg. 60, 100).
  • Figure 2: Visualization of one test trajectory (time steps 1401-1500, $\Delta t=0.15$) for Lorenz '96 states (vertical axis) over time (horizontal axis)with the observation noise $\sigma_y=1.0$. Panel (a): Ground-truth states (unknown). Panel (b): Observations (known, every 4th dimension observed). Rows 2-3 show our method MNMEF pretrained on $N=10$, and the benchmark LETKF with the ensemble size $N=10$. Panel (c): state estimation (ensemble mean), Panel (d): absolute error of mean with respect to the ground truth, and Panel (e): ensemble spread (standard deviation).
  • Figure 3: Visualization of two dimensions (index 1 and 2) in one test trajectory (time steps 1401-1500, $\Delta t=0.15$) for Lorenz '96 state values (vertical axis) over time (horizontal axis) with the ensemble size $N=10$ and the observation noise $\sigma_y=1.0$. Panel (a): Dimension 1, observed. Panel (b): Dimension 2, not observed. The first row is our method MNMEF, pretrained on $N=10$, and the second row is the benchmark LETKF. The 95% confidence intervals shown in figures are calculated as the ensemble mean $\pm$ 1.96 $\times$ the ensemble standard deviation. In summary, MNMEF performs significantly better on the unobserved dimension and comparably to LETKF on the observed dimension; meanwhile, MNMEF maintains an appropriate spread without suffering from filter degeneracy.
  • Figure 4: Comparison results on the Kuramoto–-Sivashinsky (KS) system. The upper row of plots shows direct R-RMSE comparisons between different methods, while the lower row illustrates the relative improvement of our fine-tuning method compared to benchmarks. These comparisons are presented for observation noise levels $\sigma_y=0.7$ (left column) and $\sigma_y=1.0$ (right column). Our proposed MNMEF method is highlighted in the legends with bold font and an asterisk (e.g. Pretrain$^*$). In summary, our fine-tuned MNMEF model consistently outperforms benchmarks, showing substantial improvements for small ensembles and around 20% advantage over LETKF at larger sizes (eg. 60, 100).
  • Figure 5: Visualization of one test trajectory (time steps 1901-2000, $\Delta t=1$) for Kuramoto--Sivashinshy (KS) states (vertical axis) over time (horizontal axis) with the observation noise $\sigma_y=1.0$. Panel (a): Ground-truth states (unknown). Panel (b): Observations (known, every 8th dimension observed). Rows 2-3 show our method MNMEF pretrained on $N=10$, and the benchmark LETKF with the ensemble size $N=10$. Panel (c): state estimation (ensemble mean), Panel (d): absolute error with ground truth, and Panel (e): ensemble spread (standard deviation).
  • ...and 8 more figures

Theorems & Definitions (37)

  • Remark 1
  • Remark 2: Possible extensions beyond the baseline model
  • Definition 1: The Filtering Problem
  • Definition 2: State Estimation
  • Remark 3
  • Remark 4: Factoring Our $\Gamma$
  • Remark 5: Beyond Gaussian Assumptions
  • Remark 6: Training a Transformer MNM
  • Remark 7: Set Transformer Architecture
  • Definition 3: Attention on Measures: Bounded State Space
  • ...and 27 more