Self-supervised learning for crystal property prediction via denoising
Alexander New, Nam Q. Le, Michael J. Pekala, Christopher D. Stiles
TL;DR
CDSSL tackles the challenge of scarce labeled crystal-property data by pretraining graph-based crystal models on a denoising pretext that perturbs atomic positions and predicts original edge embeddings, encouraging a generalizable structure-space representation. The method integrates a multigraph crystal representation with an MEGNet backbone and Set2Set aggregation to transfer to diverse property-prediction tasks, yielding improved accuracy over non-SSL baselines. Empirical results show consistent gains across material classes, properties, and data regimes, including low-data settings, and reveal that the learned representation captures meaningful material variation as shown by linear probing and density-volume structure. This approach enables leveraging large unlabeled structural databases to boost targeted crystal-property predictions, with potential for richer physical insights via the learned potential-energy-informed space.
Abstract
Accurate prediction of the properties of crystalline materials is crucial for targeted discovery, and this prediction is increasingly done with data-driven models. However, for many properties of interest, the number of materials for which a specific property has been determined is much smaller than the number of known materials. To overcome this disparity, we propose a novel self-supervised learning (SSL) strategy for material property prediction. Our approach, crystal denoising self-supervised learning (CDSSL), pretrains predictive models (e.g., graph networks) with a pretext task based on recovering valid material structures when given perturbed versions of these structures. We demonstrate that CDSSL models out-perform models trained without SSL, across material types, properties, and dataset sizes.
