Table of Contents
Fetching ...

Understanding Ice Crystal Habit Diversity with Self-Supervised Learning

Joseph Ko, Hariprasath Govindarajan, Fredrik Lindsten, Vanessa Przybylo, Kara Sulia, Marcus van Lier-Walqui, Kara Lamb

TL;DR

This work tackles the uncertainty in climate models arising from ice-crystal habit diversity by applying self-supervised learning to large CPI imagery. A vision transformer trained with iBOT-vMF learns latent representations of crystal morphology, with an efficient pipeline that leverages data curation (CPI-H-1M) and pre-trained weights to reduce compute. The learned embeddings achieve strong downstream performance (e.g., Top-1 accuracy of $84.39\%$ on CPI-21K with logistic regression) and align with expert habit labels in latent space, enabling a data-driven quantification of crystal diversity via the von Mises–Fisher concentration parameter $\kappa$. This approach provides a scalable, physics-informed method to characterize ice crystals, potentially reducing microphysical uncertainties in climate models and guiding anomaly detection and property–thermodynamics linkages.

Abstract

Ice-containing clouds strongly impact climate, but they are hard to model due to ice crystal habit (i.e., shape) diversity. We use self-supervised learning (SSL) to learn latent representations of crystals from ice crystal imagery. By pre-training a vision transformer with many cloud particle images, we learn robust representations of crystal morphology, which can be used for various science-driven tasks. Our key contributions include (1) validating that our SSL approach can be used to learn meaningful representations, and (2) presenting a relevant application where we quantify ice crystal diversity with these latent representations. Our results demonstrate the power of SSL-driven representations to improve the characterization of ice crystals and subsequently constrain their role in Earth's climate system.

Understanding Ice Crystal Habit Diversity with Self-Supervised Learning

TL;DR

This work tackles the uncertainty in climate models arising from ice-crystal habit diversity by applying self-supervised learning to large CPI imagery. A vision transformer trained with iBOT-vMF learns latent representations of crystal morphology, with an efficient pipeline that leverages data curation (CPI-H-1M) and pre-trained weights to reduce compute. The learned embeddings achieve strong downstream performance (e.g., Top-1 accuracy of on CPI-21K with logistic regression) and align with expert habit labels in latent space, enabling a data-driven quantification of crystal diversity via the von Mises–Fisher concentration parameter . This approach provides a scalable, physics-informed method to characterize ice crystals, potentially reducing microphysical uncertainties in climate models and guiding anomaly detection and property–thermodynamics linkages.

Abstract

Ice-containing clouds strongly impact climate, but they are hard to model due to ice crystal habit (i.e., shape) diversity. We use self-supervised learning (SSL) to learn latent representations of crystals from ice crystal imagery. By pre-training a vision transformer with many cloud particle images, we learn robust representations of crystal morphology, which can be used for various science-driven tasks. Our key contributions include (1) validating that our SSL approach can be used to learn meaningful representations, and (2) presenting a relevant application where we quantify ice crystal diversity with these latent representations. Our results demonstrate the power of SSL-driven representations to improve the characterization of ice crystals and subsequently constrain their role in Earth's climate system.

Paper Structure

This paper contains 18 sections, 3 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Examples of CPI images grouped by habit (i.e., shape).
  • Figure 2: 2D projections of the 384-dimensional latent embeddings. A subset of 3000 samples is shown here. (a) Non-linear dimensionality reduction with UMAP. (b) Linear projection with PCA.
  • Figure 3: Crystal diversity ($\kappa$) using CPI-ENV-500K. (a) $\kappa$ as a function of air temperature and stratified by campaign. (b) $\kappa$ as a function of particle size (width) and stratified by campaign.
  • Figure 4: Cosine similarity heatmap. Intra-(diagonal) and inter-(row-wise) class similarity. Representative CPI images from each class are shown at the top.
  • Figure 5: 2D projections of the 384-dimensional latent embeddings. 7000 samples (1000 samples per class) are shown here. (a) Non-linear dimensionality reduction with UMAP. (b) Linear projection with PCA.
  • ...and 2 more figures