Table of Contents
Fetching ...

Generative Modeling of Aerosol State Representations

Ehsan Saleh, Saba Ghaffari, Jeffrey H. Curtis, Lekha Patel, Peter A. Bosler, Nicole Riemer, Matthew West

TL;DR

This paper tackles the high dimensionality of aerosol state representations by learning compact, physically meaningful latent representations using a variational autoencoder (VAE). It demonstrates that hundreds of input features corresponding to speciated mass and number distributions can be compressed to $10$ latent variables while preserving important diagnostics such as CCN spectra, optical properties, and ice nucleation. A noise-resilient preprocessing strategy and a novel realism metric based on sliced Wasserstein distance are introduced to improve robustness and realism of generated aerosols, enabling surrogate modeling for climate applications. The work provides a path toward efficient, scalable aerosol representations and outlines future directions for time-evolving surrogates and diagnostic-weighted training to further enhance ice nucleation predictions.

Abstract

Aerosol-cloud--radiation interactions remain among the most uncertain components of the Earth's climate system, in partdue to the high dimensionality of aerosol state representations and the difficulty of obtaining complete \textit{in situ} measurements. Addressing these challenges requires methods that distill complex aerosol properties into compact yet physically meaningful forms. Generative autoencoder models provide such a pathway. We present a framework for learning deep variational autoencoder (VAE) models of speciated mass and number concentration distributions, which capture detailed aerosol size-composition characteristics. By compressing hundreds of original dimensions into ten latent variables, the approach enables efficient storage and processing while preserving the fidelity of key diagnostics, including cloud condensation nuclei (CCN) spectra, optical scattering and absorption coefficients, and ice nucleation properties. Results show that CCN spectra are easiest to reconstruct accurately, optical properties are moderately difficult, and ice nucleation properties are the most challenging. To improve performance, we introduce a preprocessing optimization strategy that avoids repeated retraining and yields latent representations resilient to high-magnitude Gaussian noise, boosting accuracy for CCN spectra, optical coefficients, and frozen fraction spectra. Finally, we propose a novel realism metric -- based on the sliced Wasserstein distance between generated samples and a held-out test set -- for optimizing the KL divergence weight in VAEs. Together, these contributions enable compact, robust, and physically meaningful representations of aerosol states for large-scale climate applications.

Generative Modeling of Aerosol State Representations

TL;DR

This paper tackles the high dimensionality of aerosol state representations by learning compact, physically meaningful latent representations using a variational autoencoder (VAE). It demonstrates that hundreds of input features corresponding to speciated mass and number distributions can be compressed to latent variables while preserving important diagnostics such as CCN spectra, optical properties, and ice nucleation. A noise-resilient preprocessing strategy and a novel realism metric based on sliced Wasserstein distance are introduced to improve robustness and realism of generated aerosols, enabling surrogate modeling for climate applications. The work provides a path toward efficient, scalable aerosol representations and outlines future directions for time-evolving surrogates and diagnostic-weighted training to further enhance ice nucleation predictions.

Abstract

Aerosol-cloud--radiation interactions remain among the most uncertain components of the Earth's climate system, in partdue to the high dimensionality of aerosol state representations and the difficulty of obtaining complete \textit{in situ} measurements. Addressing these challenges requires methods that distill complex aerosol properties into compact yet physically meaningful forms. Generative autoencoder models provide such a pathway. We present a framework for learning deep variational autoencoder (VAE) models of speciated mass and number concentration distributions, which capture detailed aerosol size-composition characteristics. By compressing hundreds of original dimensions into ten latent variables, the approach enables efficient storage and processing while preserving the fidelity of key diagnostics, including cloud condensation nuclei (CCN) spectra, optical scattering and absorption coefficients, and ice nucleation properties. Results show that CCN spectra are easiest to reconstruct accurately, optical properties are moderately difficult, and ice nucleation properties are the most challenging. To improve performance, we introduce a preprocessing optimization strategy that avoids repeated retraining and yields latent representations resilient to high-magnitude Gaussian noise, boosting accuracy for CCN spectra, optical coefficients, and frozen fraction spectra. Finally, we propose a novel realism metric -- based on the sliced Wasserstein distance between generated samples and a held-out test set -- for optimizing the KL divergence weight in VAEs. Together, these contributions enable compact, robust, and physically meaningful representations of aerosol states for large-scale climate applications.

Paper Structure

This paper contains 19 sections, 16 equations, 20 figures, 2 tables.

Figures (20)

  • Figure 1: Example of a speciated mass distribution sample. (a) Stacked bar plot of the mass distribution sample, $dM_a/d \log_{10} D$. (b) The same mass distribution but as a heat map. (c) Speciated mass distribution, averaged over all 25000 samples in the dataset.
  • Figure 2: The aerosol diagnostics of the same sample visualized in Figure \ref{['fig:01dataintro']}. (a) the speciated mass distribution. (c) the total mass distribution. (d) the number distribution. (e) the CCN spectrum (i.e., the cloud condensation nuclei fraction of the particles at each super-saturation level of critical relative humidity). (f) the volume scattering coefficient spectrum. (g) the volume absorption coefficient spectrum. (h) the frozen fraction spectrum.
  • Figure 3: The modeling pipeline. (a) the preprocessing transformation process. (b) the variational autoencoding pipeline. (c) the preprocessing simulation framework
  • Figure 4: The simulated effect of preprocessing on the aerosol diagnostic metrics. The same process of proportional Gaussian noise injection was applied to both a tuned and a plain preprocessor. (a) the physical aerosol diagnostic metrics. (b) the vector diagnostic metrics.
  • Figure 5: Examples on the effect of optimal vs. plain pre-processing. Each row denotes a single sample. (a$_{1-3}$) the original speciated mass distributions. The (b$_{1-3}$) and (c$_{1-3}$) plots show the noise injected reconstruction $\hat{x}$ for the tuned and plain pre-processors, respectively. (d$_{1-3}$) the number distribution comparison of the original vs. the tuned and plain samples.
  • ...and 15 more figures