Enhancing composition-based materials property prediction by cross-modal knowledge transfer
Ivan Rubtsov, Ivan Dudakov, Yuri Kuratov, Vadim Korolev
TL;DR
The paper tackles composition-based materials property prediction and the challenge of relating composition to structure by introducing cross-modal knowledge transfer. It presents two formulations: implicit transfer (imKT), which pretrains chemical language models on multimodal embeddings and aligns them to a multimodal foundation model, and explicit transfer (exKT), which generates crystal structures with CrystaLLM and then applies structure-aware predictors. Across benchmarks such as LLM4Mat-Bench and MatBench, imKT delivers substantial gains—state-of-the-art in 25 of 32 tasks with an average $MAE$ reduction of $15.7\%$—while exKT shows more limited gains, partly due to metastable compounds and CSP limitations; a SHAP-IQ explainability analysis demonstrates informative high-order token interactions between elements and motifs. Overall, the modular cross-modal framework provides a scalable path to improve composition-based predictions and holds potential for further gains via enhanced multimodal representations and CSP methods.
Abstract
Crystal graph neural networks are widely applicable in modeling experimentally synthesized compounds and hypothetical materials with unknown synthesizability. In contrast, structure-agnostic predictive algorithms allow exploring previously inaccessible domains of chemical space. Here we present a universal approach for enhancing composition-based materials property prediction by means of cross-modal knowledge transfer. Two formulations are proposed: implicit transfer involves pretraining chemical language models on multimodal embeddings, whereas explicit transfer suggests generating crystal structures and implementing structure-aware predictors. The proposed approaches were benchmarked on LLM4Mat-Bench and MatBench tasks, achieving state-of-the-art performance in 25 out of 32 cases. In addition, we demonstrated how another modeling aspect of chemical language models - interpretability - benefits from applying a game-theoretic approach, which is able to incorporate high-order feature interactions.
