Two Localization Strategies for Sequential MCMC Data Assimilation with Applications to Nonlinear Non-Gaussian Geophysical Models

Hamza Ruzayqat; Hristo G. Chipilski; Omar Knio

Two Localization Strategies for Sequential MCMC Data Assimilation with Applications to Nonlinear Non-Gaussian Geophysical Models

Hamza Ruzayqat, Hristo G. Chipilski, Omar Knio

Abstract

We present a localized data assimilation (DA) scheme based on the sequential Markov Chain Monte Carlo (SMCMC) technique [Ruzayqat et al., 2024], a provably convergent method for filtering high-dimensional, nonlinear, and potentially non-Gaussian state-space models. Unlike particle filters, which are exact methods for nonlinear non-Gaussian models, SMCMC does not assign weights to samples and therefore avoids weight degeneracy in small-ensemble regimes. We design two localization approaches within the SMCMC framework that exploit spatial sparsity of observations to reduce the effective degrees of freedom and improve efficiency. The first variant collects observed blocks into a single reduced domain and runs parallel MCMC chains over this combined region. The second variant further reduces the per-chain state dimension by decomposing the observed region into independent blocks, each augmented with a compact halo, and applying Gaspari--Cohn observation-noise tapering to smoothly down-weight distant observations. When the observation model is linear and Gaussian, we show that our approximate filtering density reduces to a Gaussian mixture from which independent samples can be drawn exactly. For nonlinear or non-Gaussian observation models, we employ an MCMC kernel. We test on high-dimensional ($d \sim 10^4 - 10^5$) state-space models, including a linear Gaussian model and a nonlinear multilayer shallow water equation with both linear and nonlinear observation operators. We consider Gaussian and non-Gaussian (Student-$t$) observation noise, showing that LSMCMC naturally handles heavy-tailed errors that cause ensemble Kalman methods to diverge. Observations include synthetic and real data from the Surface Water and Ocean Topography (SWOT) mission (NASA) and ocean drifter data (NOAA). We compare the two variants against each other and the local ensemble transform Kalman filter (LETKF).

Two Localization Strategies for Sequential MCMC Data Assimilation with Applications to Nonlinear Non-Gaussian Geophysical Models

Abstract

) state-space models, including a linear Gaussian model and a nonlinear multilayer shallow water equation with both linear and nonlinear observation operators. We consider Gaussian and non-Gaussian (Student-

) observation noise, showing that LSMCMC naturally handles heavy-tailed errors that cause ensemble Kalman methods to diverge. Observations include synthetic and real data from the Surface Water and Ocean Topography (SWOT) mission (NASA) and ocean drifter data (NOAA). We compare the two variants against each other and the local ensemble transform Kalman filter (LETKF).

Paper Structure (29 sections, 18 equations, 17 figures, 9 tables, 3 algorithms)

This paper contains 29 sections, 18 equations, 17 figures, 9 tables, 3 algorithms.

Introduction
Preliminaries
SMCMC Filter
Localized Sequential MCMC Filter
Variant 1: Joint Observed-Block Localization
Variant 2: Halo-Based Per-Block Localization
Numerical Simulations
Linear Gaussian Model
Problem setup
Parameter choices
Results
MLSWE with Linear Observation Model
Problem setup
Parameter choices
LETKF sensitivity analysis
...and 14 more sections

Figures (17)

Figure 1: Illustration of the two localization strategies on a $100\times 100$ grid with 80 blocks and randomly placed observations (LGOM = Linear Gaussian Observation Model; NLGOM = Non-Linear Gaussian Observation Model; NGOM = Non-Gaussian Observation Model). (a) V1 joint localization: observed blocks (green) are updated jointly; unobserved blocks (gray) retain their forecast values. For an LGOM the posterior is sampled directly; for an NLGOM or NGOM $P$ parallel MCMC chains of length $N_{\text{burn}}+\lceil N_a/P \rceil$ are run on the combined shaded domain. (b) V2 per-block halo localization: each observed block is updated independently using observations that fall within a halo of radius $r_{\mathrm{loc}}$. For an LGOM the blocks are sampled directly in parallel; for an NLGOM or NGOM each block runs an MCMC chain of length $N_{\text{burn}}+N_a$ in parallel.
Figure 2: (a) SWOT-like swath observation pattern at selected cycles showing the diagonal sweep across the $120\times120$ grid. (b) LETKF sensitivity to localization radius $h_{\text{loc}}$ and RTPP inflation factor $\alpha$. The star marks the best configuration ($h_{\text{loc}}{=}1.0$, $\alpha{=}1.02$, RMSE${=}0.0072$).
Figure 3: Linear Gaussian experiment ($d{=}14{,}400$): RMSE against the exact KF solution over 100 assimilation cycles for LSMCMC V1 and V2 with $M{=}1$ (solid) and $M{=}4$ (dashed), and LETKF with $K{=}50$.
Figure 4: Linear Gaussian: time series at the most-observed grid point comparing LSMCMC V1 and V2 ($M{=}4$) and LETKF ($K{=}50$) against the KF.
Figure 5: Linear Gaussian: analysis fields at cycle 50 (rows 1--2) and cycle 100 (rows 3--4). Each block shows the KF reference, V1 ($M{=}1$), and V2 ($M{=}1$) in the first row, and LETKF ($K{=}50$), V1 ($M{=}4$), and V2 ($M{=}4$) in the second row.
...and 12 more figures

Theorems & Definitions (4)

Example 3.1: Linear-Gaussian observation model -- direct sampling
Example 3.2: Non-Gaussian likelihood -- MCMC sampling
Remark 4.1
Remark 4.2

Two Localization Strategies for Sequential MCMC Data Assimilation with Applications to Nonlinear Non-Gaussian Geophysical Models

Abstract

Two Localization Strategies for Sequential MCMC Data Assimilation with Applications to Nonlinear Non-Gaussian Geophysical Models

Authors

Abstract

Table of Contents

Figures (17)

Theorems & Definitions (4)