Table of Contents
Fetching ...

Accurate and Efficient Hybrid-Ensemble Atmospheric Data Assimilation in Latent Space with Uncertainty Quantification

Hang Fan, Juan Nathaniel, Yi Xiao, Ce Bian, Fenghua Ling, Ben Fei, Lei Bai, Pierre Gentine

TL;DR

HLOBA is proposed, a three-dimensional hybrid-ensemble DA method that operates in an atmospheric latent space learned via an autoencoder (AE) and enables element-wise uncertainty estimates for its latent analysis and propagates them to model space via the decoder.

Abstract

Data assimilation (DA) combines model forecasts and observations to estimate the optimal state of the atmosphere with its uncertainty, providing initial conditions for weather prediction and reanalyses for climate research. Yet, existing traditional and machine-learning DA methods struggle to achieve accuracy, efficiency and uncertainty quantification simultaneously. Here, we propose HLOBA (Hybrid-Ensemble Latent Observation-Background Assimilation), a three-dimensional hybrid-ensemble DA method that operates in an atmospheric latent space learned via an autoencoder (AE). HLOBA maps both model forecasts and observations into a shared latent space via the AE encoder and an end-to-end Observation-to-Latent-space mapping network (O2Lnet), respectively, and fuses them through a Bayesian update with weights inferred from time-lagged ensemble forecasts. Both idealized and real-observation experiments demonstrate that HLOBA matches dynamically constrained four-dimensional DA methods in both analysis and forecast skill, while achieving end-to-end inference-level efficiency and theoretical flexibility applies to any forecasting model. Moreover, by exploiting the error decorrelation property of latent variables, HLOBA enables element-wise uncertainty estimates for its latent analysis and propagates them to model space via the decoder. Idealized experiments show that this uncertainty highlights large-error regions and captures their seasonal variability.

Accurate and Efficient Hybrid-Ensemble Atmospheric Data Assimilation in Latent Space with Uncertainty Quantification

TL;DR

HLOBA is proposed, a three-dimensional hybrid-ensemble DA method that operates in an atmospheric latent space learned via an autoencoder (AE) and enables element-wise uncertainty estimates for its latent analysis and propagates them to model space via the decoder.

Abstract

Data assimilation (DA) combines model forecasts and observations to estimate the optimal state of the atmosphere with its uncertainty, providing initial conditions for weather prediction and reanalyses for climate research. Yet, existing traditional and machine-learning DA methods struggle to achieve accuracy, efficiency and uncertainty quantification simultaneously. Here, we propose HLOBA (Hybrid-Ensemble Latent Observation-Background Assimilation), a three-dimensional hybrid-ensemble DA method that operates in an atmospheric latent space learned via an autoencoder (AE). HLOBA maps both model forecasts and observations into a shared latent space via the AE encoder and an end-to-end Observation-to-Latent-space mapping network (O2Lnet), respectively, and fuses them through a Bayesian update with weights inferred from time-lagged ensemble forecasts. Both idealized and real-observation experiments demonstrate that HLOBA matches dynamically constrained four-dimensional DA methods in both analysis and forecast skill, while achieving end-to-end inference-level efficiency and theoretical flexibility applies to any forecasting model. Moreover, by exploiting the error decorrelation property of latent variables, HLOBA enables element-wise uncertainty estimates for its latent analysis and propagates them to model space via the decoder. Idealized experiments show that this uncertainty highlights large-error regions and captures their seasonal variability.
Paper Structure (18 sections, 19 equations, 12 figures)

This paper contains 18 sections, 19 equations, 12 figures.

Figures (12)

  • Figure 1: Overview of the HLOBA method.a, HLOBA pipeline. The background field and observations are mapped into a shared latent space using the Encoder and O2Lnet, respectively, and fused in a Bayesian manner based on their uncertainty estimates derived from time-lagged ensembles. The resulting latent analysis is then decoded to obtain the final analysis in model space. b, Generation of time-lagged ensembles. Analyses produced at different assimilation cycles are propagated forward to form a flow-dependent ensemble that represents background uncertainty without requiring parallel ensemble forecasts. c, Comparison of HLOBA with other hybrid DA methods when assimilating real observations on an NVIDIA A100. The x-axis shows the assimilation time per observation, and the y-axis shows forecast improvement relative to 3DVar. Marker size indicates GPU memory usage. d, Climatological estimates of error correlations for latent-space background $\boldsymbol{z}_b$ and observation $\boldsymbol{z}_o$ (see Methods for details). Inter-variable correlations are computed across latent dimensions at each grid point and averaged in absolute value; spatial correlations are computed across grid points within each latent dimension and averaged in absolute value.
  • Figure 2: Performance comparison of HLOBA and other hybrid DA methods in cycling DA experiments for 2017.a, Analysis errors from idealized cycling DA experiments, evaluated against ERA5. Observations were sampled from ERA5 using radiosonde and surface observation locations from GDAS at 0000 UTC on 1 January. Four observation times were assimilated in each cycle at 6-hour intervals. b, Annual-mean errors for 5-day forecasts initialized from the analyses in a after each assimilation cycle in a, evaluated against ERA5. c, Analysis errors from cycling experiments with real observations, evaluated using a withheld 10% subset of observations not assimilated. Four observation times were assimilated in each cycle at 12-hour intervals. Errors of ERA5 evaluated at the same observation locations are also shown for comparison. For clarity, H3DVar and HL3DVar are not shown. d, Annual-mean 5-day forecast errors initialized from the analyses in c, evaluated using all available observations.
  • Figure 3: Uncertainty estimation of HLOBA analyses in idealized experiments.a, Examples of analysis RMSE and corresponding estimated standard deviation (Std.) for Z500 at a single time step, daily mean, and monthly mean. Shown are results for 0000 UTC on 1 February 2017, the daily mean on 1 February, and the monthly mean for February 2017. b, Pearson correlation coefficients (PCC) between the estimated standard deviation and the true RMSE for single-step, daily-mean, and monthly-mean analyses over 2017. c, Seasonal variations in analysis RMSE are reflected in the estimated uncertainty. Results are shown for Q500 and T2m.
  • Figure 4: Impact of time-lagged ensembles on hybrid DA methods in idealized experiments.a, Impact of introducing time-lagged ensembles on different hybrid DA methods, relative to their non-ensemble counterparts. b, Improvement from using hybrid-ensemble (Hybrid) latent observation error covariance $\mathbf{R}_z$ in HLOBA, relative to a climatological configuration (clim.) without ensemble information. c, Same as b, but using hybrid-ensemble latent background error covariance $\mathbf{B}_z$. d, Additional improvement from introducing a hybrid $\mathbf{B}_z$ in HLOBA when $\mathbf{R}_z$ is fully ensemble-based (Ens.). Values show mean improvements averaged over the data assimilation (DA) and forecast (FC) stages; colors indicate improvements in the forecast stage only.
  • Figure 5: Advantages of introducing O2Lnet, demonstrated under idealized settings. a, Illustration of observation-only analysis (OOA) based on O2Lnet, obtained by decoding the latent observation variable $\boldsymbol{z}_o$ produced by O2Lnet. Observations are sampled from ERA5, with observation locations indicated by black dots. Shown are Z500 and T2m at 0000 UTC on 1 February 2017. b, Accuracy of OOA and its robustness to observation noise, compared with HL3DVar and HL4DVar. Solid lines show results with observation noise drawn from a zero-mean Gaussian distribution with variance equal to 0.03 times the climatological variance, while dashed lines correspond to a variance of 0.1 times the climatological variance. c, Example illustrating the consistency between ensemble-estimated uncertainties of the background and OOA terms and their realized mean squared errors (MSE). Shown is T500 at 0000 UTC on 1 February 2017, with all fields normalized to the range [0, 1]. d, Pearson correlation coefficients between estimated error variances and realized MSEs for the background and observation terms, $\boldsymbol{z}_b$ and $\boldsymbol{z}_o$. Results based on climatological variance estimates are also shown for comparison.
  • ...and 7 more figures