Table of Contents
Fetching ...

An Ensemble Score Filter for Tracking High-Dimensional Nonlinear Dynamical Systems

Feng Bao, Zezhong Zhang, Guannan Zhang

TL;DR

The paper tackles the challenge of nonlinear filtering in extremely high-dimensional systems. It introduces EnSF, a training-free, score-based diffusion approach that evolves the filtering density in a pseudo-time and updates it analytically with data via a backward SDE. Key contributions include a training-free Monte Carlo score estimator, two hyperparameters for diffusion regularization, and demonstrated robustness and scalability to Lorenz-96 problems with up to $10^6$ dimensions, outperforming LETKF in several aspects while delivering substantial speedups on GPUs. The results indicate EnSF’s potential for scalable, accurate data assimilation in complex, high-dimensional settings, including nonlinear observations and incomplete model knowledge.

Abstract

We propose an ensemble score filter (EnSF) for solving high-dimensional nonlinear filtering problems with superior accuracy. A major drawback of existing filtering methods, e.g., particle filters or ensemble Kalman filters, is the low accuracy in handling high-dimensional and highly nonlinear problems. EnSF attacks this challenge by exploiting the score-based diffusion model, defined in a pseudo-temporal domain, to characterizing the evolution of the filtering density. EnSF stores the information of the recursively updated filtering density function in the score function, instead of storing the information in a set of finite Monte Carlo samples (used in particle filters and ensemble Kalman filters). Unlike existing diffusion models that train neural networks to approximate the score function, we develop a training-free score estimation that uses a mini-batch-based Monte Carlo estimator to directly approximate the score function at any pseudo-spatial-temporal location, which provides sufficient accuracy in solving high-dimensional nonlinear problems as well as saves a tremendous amount of time spent on training neural networks. High-dimensional Lorenz-96 systems are used to demonstrate the performance of our method. EnSF provides surprising performance, compared with the state-of-the-art Local Ensemble Transform Kalman Filter method, in reliably and efficiently tracking extremely high-dimensional Lorenz systems (up to 1,000,000 dimensions) with highly nonlinear observation processes.

An Ensemble Score Filter for Tracking High-Dimensional Nonlinear Dynamical Systems

TL;DR

The paper tackles the challenge of nonlinear filtering in extremely high-dimensional systems. It introduces EnSF, a training-free, score-based diffusion approach that evolves the filtering density in a pseudo-time and updates it analytically with data via a backward SDE. Key contributions include a training-free Monte Carlo score estimator, two hyperparameters for diffusion regularization, and demonstrated robustness and scalability to Lorenz-96 problems with up to dimensions, outperforming LETKF in several aspects while delivering substantial speedups on GPUs. The results indicate EnSF’s potential for scalable, accurate data assimilation in complex, high-dimensional settings, including nonlinear observations and incomplete model knowledge.

Abstract

We propose an ensemble score filter (EnSF) for solving high-dimensional nonlinear filtering problems with superior accuracy. A major drawback of existing filtering methods, e.g., particle filters or ensemble Kalman filters, is the low accuracy in handling high-dimensional and highly nonlinear problems. EnSF attacks this challenge by exploiting the score-based diffusion model, defined in a pseudo-temporal domain, to characterizing the evolution of the filtering density. EnSF stores the information of the recursively updated filtering density function in the score function, instead of storing the information in a set of finite Monte Carlo samples (used in particle filters and ensemble Kalman filters). Unlike existing diffusion models that train neural networks to approximate the score function, we develop a training-free score estimation that uses a mini-batch-based Monte Carlo estimator to directly approximate the score function at any pseudo-spatial-temporal location, which provides sufficient accuracy in solving high-dimensional nonlinear problems as well as saves a tremendous amount of time spent on training neural networks. High-dimensional Lorenz-96 systems are used to demonstrate the performance of our method. EnSF provides surprising performance, compared with the state-of-the-art Local Ensemble Transform Kalman Filter method, in reliably and efficiently tracking extremely high-dimensional Lorenz systems (up to 1,000,000 dimensions) with highly nonlinear observation processes.
Paper Structure (19 sections, 28 equations, 17 figures)

This paper contains 19 sections, 28 equations, 17 figures.

Figures (17)

  • Figure 1: Illustration of the nonlinearity of observation process by comparing the true state $X_{t}$ and the observation $Y_t$ along four randomly selected directions. Due to the nonlinearity of $arctan()$, the observation $Y_t$ does not provide sufficient information of the state when $X_t$ is outside the domain $[-\pi/2,\pi/2]$.
  • Figure 2: Comparison between the true state trajectories and the estimated trajectories obtained by EnSF, each sub-figure shows the trajectories along randomly selected three directions in the $1,000,000$-dimensional state space. We observe that even though the initial guess for EnSF is far from the true initial state, EnSF gradually captures the true state by assimilating the observational data after several filtering steps, providing sufficient accuracy in capturing such a high-dimensional chaotic system.
  • Figure 3: LETKF's fine-tuning chart where the RMSE is averaged on all data assimilation times with 10 repetitions. The highlighted cells are the best three parameter combinations selected for LETKF.
  • Figure 4: LETKF's fine-tuning chart where the RMSE is averaged on the last 50 data assimilation times with 10 repetitions. The highlighted cells are the best three parameter combinations selected for LETKF.
  • Figure 5: EnSF's fine-tuning chart where the RMSE is averaged on all data assimilation times with 10 repetitions. The highlighted cells are the best three parameter combinations selected for EnSF. Compared to LETKF, EnSF's performance is much more stable with respect to small changes of the hyper-parameters.
  • ...and 12 more figures

Theorems & Definitions (2)

  • Remark : Avoiding the curse of dimensionality
  • Remark : Reproducibility