Table of Contents
Fetching ...

Real-time Bayesian inference at extreme scale: A digital twin for tsunami early warning applied to the Cascadia subduction zone

Stefan Henneking, Sreeram Venkat, Veselin Dobrev, John Camier, Tzanio Kolev, Milinda Fernando, Alice-Agnes Gabriel, Omar Ghattas

TL;DR

This work develops a real-time Bayesian digital twin for tsunami early warning in the Cascadia subduction zone by coupling seafloor acoustic data with a 3D acoustic–gravity PDE. It achieves an offline-online decomposition and FFT-based Hessian matvecs that convert an intractable billion-parameter inverse problem into real-time inference and forecasting, scaling efficiently on up to 43,520 GPUs. The online phase resolves the MAP estimate and QoI predictions in under 0.2 seconds, after an offline precomputation of adjoint solutions and data-space transforms that would otherwise require prohibitive PDE solves. The approach delivers quantified uncertainty in tsunami forecasts and holds promise for real-world deployment, with potential extensions to broader geophysical and inverse-scattering problems.

Abstract

We present a Bayesian inversion-based digital twin that employs acoustic pressure data from seafloor sensors, along with 3D coupled acoustic-gravity wave equations, to infer earthquake-induced spatiotemporal seafloor motion in real time and forecast tsunami propagation toward coastlines for early warning with quantified uncertainties. Our target is the Cascadia subduction zone, with one billion parameters. Computing the posterior mean alone would require 50 years on a 512 GPU machine. Instead, exploiting the shift invariance of the parameter-to-observable map and devising novel parallel algorithms, we induce a fast offline-online decomposition. The offline component requires just one adjoint wave propagation per sensor; using MFEM, we scale this part of the computation to the full El Capitan system (43,520 GPUs) with 92% weak parallel efficiency. Moreover, given real-time data, the online component exactly solves the Bayesian inverse and forecasting problems in 0.2 seconds on a modest GPU system, a ten-billion-fold speedup.

Real-time Bayesian inference at extreme scale: A digital twin for tsunami early warning applied to the Cascadia subduction zone

TL;DR

This work develops a real-time Bayesian digital twin for tsunami early warning in the Cascadia subduction zone by coupling seafloor acoustic data with a 3D acoustic–gravity PDE. It achieves an offline-online decomposition and FFT-based Hessian matvecs that convert an intractable billion-parameter inverse problem into real-time inference and forecasting, scaling efficiently on up to 43,520 GPUs. The online phase resolves the MAP estimate and QoI predictions in under 0.2 seconds, after an offline precomputation of adjoint solutions and data-space transforms that would otherwise require prohibitive PDE solves. The approach delivers quantified uncertainty in tsunami forecasts and holds promise for real-world deployment, with potential extensions to broader geophysical and inverse-scattering problems.

Abstract

We present a Bayesian inversion-based digital twin that employs acoustic pressure data from seafloor sensors, along with 3D coupled acoustic-gravity wave equations, to infer earthquake-induced spatiotemporal seafloor motion in real time and forecast tsunami propagation toward coastlines for early warning with quantified uncertainties. Our target is the Cascadia subduction zone, with one billion parameters. Computing the posterior mean alone would require 50 years on a 512 GPU machine. Instead, exploiting the shift invariance of the parameter-to-observable map and devising novel parallel algorithms, we induce a fast offline-online decomposition. The offline component requires just one adjoint wave propagation per sensor; using MFEM, we scale this part of the computation to the full El Capitan system (43,520 GPUs) with 92% weak parallel efficiency. Moreover, given real-time data, the online component exactly solves the Bayesian inverse and forecasting problems in 0.2 seconds on a modest GPU system, a ten-billion-fold speedup.

Paper Structure

This paper contains 21 sections, 12 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: (a) Topobathymetric map of the Cascadia subduction zone (CSZ) based on GEBCO's (General Bathymetric Chart of the Oceans) 15-arc-second gridded bathymetric data set. The shaded area illustrates the potentially "locked" portion of the megathrust interface. The red line marks the trench where the subducting Explorer, Juan de Fuca, and Gorda plates begin their descent beneath the North American Plate. (b)--(c) Snapshots of vertical seafloor uplift and seafloor velocity from a physics-based 3D dynamic rupture and seismic wave propagation computation of a magnitude 8.7 earthquake scenario spanning the full margin of the CSZ. (d) Representative block of a 3D multi-block hexahedral mesh of the CSZ, depicting bathymetry-adapted meshing.
  • Figure 2: Real-time Bayesian inference framework. The inverse solution is decomposed into several precomputation (offline) Phases 1--3 that are executed just once, and a real-time (online) Phase 4 of parameter inference and QoI prediction that is executed when an earthquake occurs and data are acquired. Phase 1 computes adjoint PDE solutions of the acoustic--gravity model (one PDE solution per sensor and QoI forecast location) to precompute the p2o and p2q block Toeplitz matrices. Phases 2--4 rely on fast FFT-based Hessian actions using this block Toeplitz structure and a transformation of the inverse operator from the high-dimensional parameter space to the much lower-dimensional data space. Compute times for each phase are given in Table \ref{['tab:time-to-solution']} (§\ref{['sec:results']}).
  • Figure 3: Physics-based magnitude 8.7 dynamic rupture earthquake scenario for a margin-wide rupture in the CSZ, from left to right: (a) true seafloor displacement; (b) snapshot of true seafloor acoustic pressure field with 600 hypothesized sensor locations; (c) snapshot of true sea surface wave height; (d) inferred mean of seafloor displacement; (e) uncertainties plotted as pointwise standard deviations in meters of seafloor normal displacement; and (f) snapshot of reconstructed sea surface wave height with 21 locations for QoI predictions. Hyperlinks to animations of: (i) https://youtu.be/eQlHehX_u6k (seafloor vertical uplift and normal velocity); (ii) https://youtu.be/-CPxuK6bebk (seafloor pressure and surface wave height); and (iii) https://youtu.be/9OAPWumAd1g (true and inferred seafloor normal displacement).
  • Figure 4: Real-time QoI predictions with uncertainties illustrated as 95% credible intervals (CIs) inferred from noisy, synthetic data of 600 hypothesized seafloor acoustic pressure sensors for a margin-wide rupture in the CSZ. The QoI numbers (#1--#8) refer to (a subset of) the 21 QoI forecast locations marked in the inferred (reconstructed) sea surface wave height plot in Fig. \ref{['fig:csz-mw']}.
  • Figure 5: Weak scalability (left) and strong scalability (right) results on El Capitan, from 85 nodes (340 AMD MI300A GPUs) to 10,880 nodes (43,520 GPUs), on Alps, from 36 nodes (144 NVIDIA GH200 GPUs) to 2,304 nodes (9,216 GPUs), and on Perlmutter, from 47 nodes (188 NVIDIA A100 GPUs) to 1,504 nodes (6,016 GPUs). Numbers along the graph lines indicate parallel efficiency.
  • ...and 2 more figures