Contrastive Learning of English Language and Crystal Graphs for Multimodal Representation of Materials Knowledge
Yang Jeong Park, Mayank Kumaran, Chia-Wei Hsu, Elsa Olivetti, Ju Li
TL;DR
This work tackles data scarcity and biased sampling in crystal science by introducing CLaC, a multimodal contrastive model that jointly embeds crystal graphs and language. By training on 126k GPT-synthesized crystal-text pairs (and supplementary literature-derived text), CLaC learns a shared latent space via inter-modal and intra-modal alignment, achieving state-of-the-art zero-shot retrieval and strong performance on NER and PAC tasks. The approach leverages graph encoders (CGCNN, PaiNN) and text encoders (SciBERT, MatSciBERT) with synthetic data to overcome limited crystal data, demonstrating robust cross-modal generalization and meaningful latent-space organization. The work highlights the potential of synthetic data and multimodal supervision to advance materials discovery, while noting limitations to crystals-only domains and the need to extend to polycrystals and MOFs. Overall, CLaC represents a scalable, data-efficient pathway toward language-guided crystal design and retrieval.
Abstract
Artificial intelligence (AI) is increasingly used for the inverse design of materials, such as crystals and molecules. Existing AI research on molecules has integrated chemical structures of molecules with textual knowledge to adapt to complex instructions. However, this approach has been unattainable for crystals due to data scarcity from the biased distribution of investigated crystals and the lack of semantic supervision in peer-reviewed literature. In this work, we introduce a contrastive language-crystals model (CLaC) pre-trained on a newly synthesized dataset of 126k crystal structure-text pairs. To demonstrate the advantage of using synthetic data to overcome data scarcity, we constructed a comparable dataset extracted from academic papers. We evaluate CLaC's generalization ability through various zero-shot cross-modal tasks and downstream applications. In experiments, CLaC achieves state-of-the-art zero-shot generalization performance in understanding crystal structures, surpassing latest large language models.
