Reconstructing the Aerosol State from Partial Observations with Generative Modeling
E. Saleh, S. Ghaffari, J. H. Curtis, L. Patel, P. A. Bosler, N. Riemer, M. West
TL;DR
This work presents a conditional generative framework to reconstruct the aerosol state from partial observations and to propagate uncertainty to climate-relevant diagnostics. By training CVAEs on synthetic PartMC-MOSAIC data, the method maps partial labels to ensembles of full aerosol states and yields diagnostic estimates with uncertainty intervals. The study shows that high-dimensional labels (full number and mass distributions plus species masses) significantly tighten constraints on CCN activity, volume scattering, and especially dust- and BC-sensitive diagnostics like absorption and frozen fraction, while a Wasserstein-based regularization improves compliance between input labels and generated states. The approach offers a flexible, uncertainty-aware pathway for translating incomplete measurements into actionable aerosol-climate inferences and can be extended to real measurements, hybrid training, or fully measurement-based inference to inform instrument design and field campaigns.
Abstract
Key aerosol properties that shape climate -- such as CCN activity, scattering and absorption, and ice nucleation efficiency -- are difficult to infer from measurements that typically capture only a part of the aerosol state. We develop a conditional generative framework that maps a label (a vector of partial observations) to an ensemble of plausible aerosol states and propagates these to diagnostics, yielding mean estimates with confidence intervals. Using synthetic data, we evaluate two label configurations: a low-dimensional setup with limited number distribution and bulk-composition information, and a high-dimensional setup including complete number and total mass distributions plus species bulk masses. Generated samples maintain strong label compliance, and higher-dimensional labels markedly reduce variability. CCN activity and volume scattering are well constrained even under the low-dimensional setup, whereas dust- and BC-sensitive diagnostics (frozen fraction, absorption) benefit substantially from the additional information in the high-dimensional case. This framework clarifies which observational inputs most effectively constrain different diagnostics and demonstrates how generative machine learning can provide uncertainty-aware estimates from incomplete aerosol information.
