Understanding Ice Crystal Habit Diversity with Self-Supervised Learning
Joseph Ko, Hariprasath Govindarajan, Fredrik Lindsten, Vanessa Przybylo, Kara Sulia, Marcus van Lier-Walqui, Kara Lamb
TL;DR
This work tackles the uncertainty in climate models arising from ice-crystal habit diversity by applying self-supervised learning to large CPI imagery. A vision transformer trained with iBOT-vMF learns latent representations of crystal morphology, with an efficient pipeline that leverages data curation (CPI-H-1M) and pre-trained weights to reduce compute. The learned embeddings achieve strong downstream performance (e.g., Top-1 accuracy of $84.39\%$ on CPI-21K with logistic regression) and align with expert habit labels in latent space, enabling a data-driven quantification of crystal diversity via the von Mises–Fisher concentration parameter $\kappa$. This approach provides a scalable, physics-informed method to characterize ice crystals, potentially reducing microphysical uncertainties in climate models and guiding anomaly detection and property–thermodynamics linkages.
Abstract
Ice-containing clouds strongly impact climate, but they are hard to model due to ice crystal habit (i.e., shape) diversity. We use self-supervised learning (SSL) to learn latent representations of crystals from ice crystal imagery. By pre-training a vision transformer with many cloud particle images, we learn robust representations of crystal morphology, which can be used for various science-driven tasks. Our key contributions include (1) validating that our SSL approach can be used to learn meaningful representations, and (2) presenting a relevant application where we quantify ice crystal diversity with these latent representations. Our results demonstrate the power of SSL-driven representations to improve the characterization of ice crystals and subsequently constrain their role in Earth's climate system.
