Table of Contents
Fetching ...

Nonlinear Filtering with Brenier Optimal Transport Maps

Mohammad Al-Jarrah, Niyizhen Jin, Bamdad Hosseini, Amirhossein Taghvaei

TL;DR

The paper tackles nonlinear filtering for highly non-Gaussian, multi-modal posteriors by learning Brenier OT maps that push the prior distribution to the posterior without requiring an analytical likelihood. It formulates a likelihood-free, max-min OT objective implemented with neural networks to model the transport map $T_t$ and accompanying potential $f$, enabling scalable conditioning in high dimensions. Theoretical results establish consistency and finite-sample error bounds tied to the optimality gap, while extensive experiments (bimodal static/dynamic, Lorenz-63, MNIST in-painting) show the OT approach more accurately captures multimodal posteriors and robustly scales beyond SIR/EnKF in multimodal settings. The method offers a principled uncertainty-quantification framework for nonlinear filtering with potential computational trade-offs, and suggests practical improvements via offline training warm starts and architecture tuning.

Abstract

This paper is concerned with the problem of nonlinear filtering, i.e., computing the conditional distribution of the state of a stochastic dynamical system given a history of noisy partial observations. Conventional sequential importance resampling (SIR) particle filters suffer from fundamental limitations, in scenarios involving degenerate likelihoods or high-dimensional states, due to the weight degeneracy issue. In this paper, we explore an alternative method, which is based on estimating the Brenier optimal transport (OT) map from the current prior distribution of the state to the posterior distribution at the next time step. Unlike SIR particle filters, the OT formulation does not require the analytical form of the likelihood. Moreover, it allows us to harness the approximation power of neural networks to model complex and multi-modal distributions and employ stochastic optimization algorithms to enhance scalability. Extensive numerical experiments are presented that compare the OT method to the SIR particle filter and the ensemble Kalman filter, evaluating the performance in terms of sample efficiency, high-dimensional scalability, and the ability to capture complex and multi-modal distributions.

Nonlinear Filtering with Brenier Optimal Transport Maps

TL;DR

The paper tackles nonlinear filtering for highly non-Gaussian, multi-modal posteriors by learning Brenier OT maps that push the prior distribution to the posterior without requiring an analytical likelihood. It formulates a likelihood-free, max-min OT objective implemented with neural networks to model the transport map and accompanying potential , enabling scalable conditioning in high dimensions. Theoretical results establish consistency and finite-sample error bounds tied to the optimality gap, while extensive experiments (bimodal static/dynamic, Lorenz-63, MNIST in-painting) show the OT approach more accurately captures multimodal posteriors and robustly scales beyond SIR/EnKF in multimodal settings. The method offers a principled uncertainty-quantification framework for nonlinear filtering with potential computational trade-offs, and suggests practical improvements via offline training warm starts and architecture tuning.

Abstract

This paper is concerned with the problem of nonlinear filtering, i.e., computing the conditional distribution of the state of a stochastic dynamical system given a history of noisy partial observations. Conventional sequential importance resampling (SIR) particle filters suffer from fundamental limitations, in scenarios involving degenerate likelihoods or high-dimensional states, due to the weight degeneracy issue. In this paper, we explore an alternative method, which is based on estimating the Brenier optimal transport (OT) map from the current prior distribution of the state to the posterior distribution at the next time step. Unlike SIR particle filters, the OT formulation does not require the analytical form of the likelihood. Moreover, it allows us to harness the approximation power of neural networks to model complex and multi-modal distributions and employ stochastic optimization algorithms to enhance scalability. Extensive numerical experiments are presented that compare the OT method to the SIR particle filter and the ensemble Kalman filter, evaluating the performance in terms of sample efficiency, high-dimensional scalability, and the ability to capture complex and multi-modal distributions.
Paper Structure (34 sections, 5 theorems, 84 equations, 17 figures, 1 table, 3 algorithms)

This paper contains 34 sections, 5 theorems, 84 equations, 17 figures, 1 table, 3 algorithms.

Key Result

Proposition 2.3

Assume $\pi$ is absolutely continuous with respect to the Lebesgue measure with a convex support set $\mathcal{X}$, $\mathcal{B}_y(\pi)$ admits a density with respect to the Lebesgue measure $\forall y$, and $\mathbb E[\|X\|^2]<\infty$. Then, there exists a unique pair $(\overline f,\overline T)$, m

Figures (17)

  • Figure 1: A comparison between the OT method and SIR for the static model of Sec. \ref{['sec:Static_Example']}. The OT approach captures the bimodal posterior by pushing the prior through an OT map computed by solving equation (\ref{['eq:empirical-optimization']}). On the other hand, SIR only captures one mode due to the degeneracy of the likelihood leading to only a few weights that are order $1$.
  • Figure 2: Neural net architectures for the function classes $\mathcal{F}$ and $\mathcal{T}$ within our proposed algorithm.
  • Figure 3: Numerical results for the static example in Sec. \ref{['sec:Static_Example']}. (a) top-left: Samples $\{X^i\}_{i=1}^N$ from the prior $P_X$; bottom-left: samples $\{(X^i,Y^i)\}_{i=1}^N$ from the joint distribution $P_{XY}$ in comparison with the transported samples $\{(T(X^{\sigma_i},Y^i),Y^i)\}_{i=1}^N$; rest of the panels: transported samples for $Y=1$ for different values of $N$ and three different algorithms. (b) Similar results to panel (a) but for a smaller $\lambda_w$.
  • Figure 4: Numerical results for the dynamic example \ref{['eq:model-example']}. The left panel shows the trajectory of the particles $\{X^1_t,\ldots,X^N_t\}$ along with the trajectory of the true state $X_t$ for EnKF, OT, and SIR algorithms, respectively. The second panel shows the MMD distance with respect to the exact conditional distribution. The last two panels show MMD variation with dimension and the number of particles.
  • Figure 5: Numerical results for the bimodal static example in sec. (\ref{['sec:Static_Example']}). The left panel shows the function $\frac{1}{2}x^2- \widehat{f}(x,1)$ and the conditional distribution $P_{X|Y=1}$. The right panel shows the map $\widehat{T}(x,1)$ and the prior distribution $P_X$.
  • ...and 12 more figures

Theorems & Definitions (11)

  • example 2.1: Noiseless observation
  • example 2.2: Gaussian
  • Proposition 2.3
  • Proposition 2.4
  • Remark 2.5
  • Remark 2.6
  • Proposition 2.7
  • Remark 1.1
  • Theorem 1.2: brenier1991polar
  • Proposition 1.3
  • ...and 1 more