Table of Contents
Fetching ...

Reconstructing the Aerosol State from Partial Observations with Generative Modeling

E. Saleh, S. Ghaffari, J. H. Curtis, L. Patel, P. A. Bosler, N. Riemer, M. West

TL;DR

This work presents a conditional generative framework to reconstruct the aerosol state from partial observations and to propagate uncertainty to climate-relevant diagnostics. By training CVAEs on synthetic PartMC-MOSAIC data, the method maps partial labels to ensembles of full aerosol states and yields diagnostic estimates with uncertainty intervals. The study shows that high-dimensional labels (full number and mass distributions plus species masses) significantly tighten constraints on CCN activity, volume scattering, and especially dust- and BC-sensitive diagnostics like absorption and frozen fraction, while a Wasserstein-based regularization improves compliance between input labels and generated states. The approach offers a flexible, uncertainty-aware pathway for translating incomplete measurements into actionable aerosol-climate inferences and can be extended to real measurements, hybrid training, or fully measurement-based inference to inform instrument design and field campaigns.

Abstract

Key aerosol properties that shape climate -- such as CCN activity, scattering and absorption, and ice nucleation efficiency -- are difficult to infer from measurements that typically capture only a part of the aerosol state. We develop a conditional generative framework that maps a label (a vector of partial observations) to an ensemble of plausible aerosol states and propagates these to diagnostics, yielding mean estimates with confidence intervals. Using synthetic data, we evaluate two label configurations: a low-dimensional setup with limited number distribution and bulk-composition information, and a high-dimensional setup including complete number and total mass distributions plus species bulk masses. Generated samples maintain strong label compliance, and higher-dimensional labels markedly reduce variability. CCN activity and volume scattering are well constrained even under the low-dimensional setup, whereas dust- and BC-sensitive diagnostics (frozen fraction, absorption) benefit substantially from the additional information in the high-dimensional case. This framework clarifies which observational inputs most effectively constrain different diagnostics and demonstrates how generative machine learning can provide uncertainty-aware estimates from incomplete aerosol information.

Reconstructing the Aerosol State from Partial Observations with Generative Modeling

TL;DR

This work presents a conditional generative framework to reconstruct the aerosol state from partial observations and to propagate uncertainty to climate-relevant diagnostics. By training CVAEs on synthetic PartMC-MOSAIC data, the method maps partial labels to ensembles of full aerosol states and yields diagnostic estimates with uncertainty intervals. The study shows that high-dimensional labels (full number and mass distributions plus species masses) significantly tighten constraints on CCN activity, volume scattering, and especially dust- and BC-sensitive diagnostics like absorption and frozen fraction, while a Wasserstein-based regularization improves compliance between input labels and generated states. The approach offers a flexible, uncertainty-aware pathway for translating incomplete measurements into actionable aerosol-climate inferences and can be extended to real measurements, hybrid training, or fully measurement-based inference to inform instrument design and field campaigns.

Abstract

Key aerosol properties that shape climate -- such as CCN activity, scattering and absorption, and ice nucleation efficiency -- are difficult to infer from measurements that typically capture only a part of the aerosol state. We develop a conditional generative framework that maps a label (a vector of partial observations) to an ensemble of plausible aerosol states and propagates these to diagnostics, yielding mean estimates with confidence intervals. Using synthetic data, we evaluate two label configurations: a low-dimensional setup with limited number distribution and bulk-composition information, and a high-dimensional setup including complete number and total mass distributions plus species bulk masses. Generated samples maintain strong label compliance, and higher-dimensional labels markedly reduce variability. CCN activity and volume scattering are well constrained even under the low-dimensional setup, whereas dust- and BC-sensitive diagnostics (frozen fraction, absorption) benefit substantially from the additional information in the high-dimensional case. This framework clarifies which observational inputs most effectively constrain different diagnostics and demonstrates how generative machine learning can provide uncertainty-aware estimates from incomplete aerosol information.

Paper Structure

This paper contains 17 sections, 9 equations, 13 figures, 4 tables.

Figures (13)

  • Figure 1: Conceptual overview of the training and inference procedures. True aerosol populations (top left) can be characterized in principle by full observations of the aerosol state, such as size- and composition-resolved distributions measured by instruments like the Aerosol Mass Spectrometer (AMS) together with a unified number distribution measured by a combination of the Scanning Mobility Particle Sizer (SMPS) and an Aerodynamic Particle Sizer (APS). In practice, however, such comprehensive measurements are rarely available. Instead, partial observations (bottom left) are more common, such as a truncated number distribution from the SMPS and limited species bulk masses from the Aerosol Chemical Speciation Monitor (ACSM). These partial inputs are paired with full aerosol states during training to teach a conditional generative model (CVAE) the mapping from limited observations to plausible complete states. During inference, only a single partial observation is needed; from this the trained model generates an ensemble of consistent aerosol states.
  • Figure 2: The model structure. Here $f_\theta$ is the encoder network, $g_\theta$ is the decoder network, $\mathcal{T}_{\text{x}}$ and $\mathcal{T}_{\text{y}}$ are preprocessing transformations, and $h$ is the label computation function. (a) The sample reconstruction pipeline takes a state $x$ and its label $y$ and reconstructs the approximate state $\hat{x}$. (b) The conditional generation pipeline takes a randomly-sampled latent vector $z^{\text{gen}}$ and label $y$ and generates a plausible state $x^{\text{gen}}$.
  • Figure 3: Low-dimensional label setup: An example of estimating the aerosol state.
  • Figure 4: High-dimensional label setup: An example of estimating the aerosol state.
  • Figure 5: Low-dimensional label setup: The collective aerosol diagnostic summary plots on the testing portion of the data. (a) and (b) show the truncated number distribution and limited species bulk mass label compliance errors, respectively. (d) and (g) show the input vs. generated CCN fraction scatter plots at $s=0.1\%$ and $s=0.3\%$ supersaturation levels, respectively. (e) and (h) show the input vs. generated volume scattering and absorption coefficient scatter plots at $\lambda=0.5 \, {\rm \mu m}$ wavelength. (f) and (i) show the input vs. generated frozen fraction plots at $T=-25 \, {\rm ^{\circ} C}$ and $T=-10 \, {\rm ^{\circ} C}$, respectively.
  • ...and 8 more figures