Table of Contents
Fetching ...

Enhanced parallelization of the incremental 4D-Var data assimilation algorithm using the Randomized Incremental Optimal Technique (RIOT)

Nicolas Bousserez, Jonathan J. Guerrette, Daven K. Henze

TL;DR

RIOT introduces a parallelized alternative to the inner-loop CG in incremental 4D-Var by using RSVD to approximate the prior-preconditioned Hessian, enabling extensive model-level parallelism. Across Lorenz-96 and black-carbon source inversions, RIOT achieves substantial wall-time reductions when large RSVD ensembles are run in parallel and dramatically speeds up posterior covariance computations, albeit with higher energy cost and dependence on Hessian spectra. The study demonstrates that Hessian-spectrum-aware preconditioning, notably a rotation-based sampling strategy, stabilizes RIOT in strongly nonlinear regimes, while pure spectral preconditioning can hinder convergence in some cases. Overall, RIOT shows strong potential for operational NWP systems given adequate parallel resources and careful tuning of RSVD rank and preconditioning; future work includes hybrid iterative-randomized schemes to mitigate sensitivity to Hessian spectra and to enable efficient outer-loop preconditioning.

Abstract

Incremental 4D-Var is a data assimilation algorithm used routinely at operational numerical weather predictions centers worldwide.This paper implements a new method for parallelizing incremental 4D-Var, the Randomized Incremental Optimal Technique (RIOT), which replaces the traditional sequential conjugate gradient (CG) iterations in the inner-loop of the minimization with fully parallel randomized singular value decomposition (RSVD) of the preconditioned Hessian of the cost function. RIOT is tested using the standard Lorenz-96 model (L-96) as well as two realistic high-dimensional atmospheric source inversion problems based on aircraft observations of black carbon concentrations. A new outer-loop preconditioning technique tailored to RSVD was introduced to improve convergence stability and performance. Results obtained with the L-96 system show that the performance improvement from RIOT compared to standard CG algorithms increases significantly with non-linearities. Overall, in the realistic black carbon source inversion experiments, RIOT reduces the wall-time of the 4D-Var minimization by a factor 2-3, at the cost of a factor 4-10 increase in energy cost due to the large number of parallel cores used. Furthermore, RIOT enables reduction of the wall-time computation of the analysis error covariance matrix by a factor 40 compared to a standard iterative Lanczos approach. Finally, as evidenced in this study, implementation of RIOT in an operational numerical weather prediction system will require a better understanding of its convergence properties as a function of the Hessian characteristics and, in particular, the degree of freedom for signal (DOFs) of the inverse problem.

Enhanced parallelization of the incremental 4D-Var data assimilation algorithm using the Randomized Incremental Optimal Technique (RIOT)

TL;DR

RIOT introduces a parallelized alternative to the inner-loop CG in incremental 4D-Var by using RSVD to approximate the prior-preconditioned Hessian, enabling extensive model-level parallelism. Across Lorenz-96 and black-carbon source inversions, RIOT achieves substantial wall-time reductions when large RSVD ensembles are run in parallel and dramatically speeds up posterior covariance computations, albeit with higher energy cost and dependence on Hessian spectra. The study demonstrates that Hessian-spectrum-aware preconditioning, notably a rotation-based sampling strategy, stabilizes RIOT in strongly nonlinear regimes, while pure spectral preconditioning can hinder convergence in some cases. Overall, RIOT shows strong potential for operational NWP systems given adequate parallel resources and careful tuning of RSVD rank and preconditioning; future work includes hybrid iterative-randomized schemes to mitigate sensitivity to Hessian spectra and to enable efficient outer-loop preconditioning.

Abstract

Incremental 4D-Var is a data assimilation algorithm used routinely at operational numerical weather predictions centers worldwide.This paper implements a new method for parallelizing incremental 4D-Var, the Randomized Incremental Optimal Technique (RIOT), which replaces the traditional sequential conjugate gradient (CG) iterations in the inner-loop of the minimization with fully parallel randomized singular value decomposition (RSVD) of the preconditioned Hessian of the cost function. RIOT is tested using the standard Lorenz-96 model (L-96) as well as two realistic high-dimensional atmospheric source inversion problems based on aircraft observations of black carbon concentrations. A new outer-loop preconditioning technique tailored to RSVD was introduced to improve convergence stability and performance. Results obtained with the L-96 system show that the performance improvement from RIOT compared to standard CG algorithms increases significantly with non-linearities. Overall, in the realistic black carbon source inversion experiments, RIOT reduces the wall-time of the 4D-Var minimization by a factor 2-3, at the cost of a factor 4-10 increase in energy cost due to the large number of parallel cores used. Furthermore, RIOT enables reduction of the wall-time computation of the analysis error covariance matrix by a factor 40 compared to a standard iterative Lanczos approach. Finally, as evidenced in this study, implementation of RIOT in an operational numerical weather prediction system will require a better understanding of its convergence properties as a function of the Hessian characteristics and, in particular, the degree of freedom for signal (DOFs) of the inverse problem.

Paper Structure

This paper contains 21 sections, 10 equations, 12 figures, 4 algorithms.

Figures (12)

  • Figure 1: Performance of the VarCG (yelow curve), VarBL (red curve) and RIOT (blue curve) mimimization algorithms for the 6-hour window L-96 problem. The figures show the non-quadratic cost function values (y-axis) at the beginning of each outer iteration (x-axis). Results are presented for different ranks of the Hessian approximation (from left to right column) and for different preconditioning approaches: no preconditioning (top row), spectral preconditioning (middle row), spectral preconditioning with rotation for VarBL and RIOT (bottom row). The parameter $m$ represents the number of samples in RIOT and the number of iterations in VarCG. For VarBL, $m$ is the number of iterations while $l$ is the number of samples (see algorithms \ref{['alg:lanczos']}, \ref{['alg:Block_lanczos']}, and \ref{['alg:riot_primal_precond']}). Results for a Gauss-Newton minimization (black curve) using an exact Hessian is also shown as reference.
  • Figure 2: Same as Fig. \ref{['fig:l96_6hrs_costf']} but for the 48-hour window L-96 problem.
  • Figure 3: Same as Fig. \ref{['fig:l96_6hrs_costf']} but for the 72-hour window L-96 problem.
  • Figure 4: Same as Fig. \ref{['fig:l96_6hrs_costf']} but for the 96-hour window L-96 problem.
  • Figure 5: Increment approximation error (Euclidian norm) (y-axis) for the first inner loop of the small BC problem as a function of the rank of the Hessian approximation (x-axis) for different methods: VarCG (blue line), RIOT (red line), TSVD LRA (black solid line), TSVD LRU (black dashed line) and TSVD adaptive (black doted line) updates. From top to bottom, the RIOT algorithm uses an oversampling parameter in the randomized SVD with values p=0, p=5 and p=10, respectively.
  • ...and 7 more figures