Table of Contents
Fetching ...

Enhancing composition-based materials property prediction by cross-modal knowledge transfer

Ivan Rubtsov, Ivan Dudakov, Yuri Kuratov, Vadim Korolev

TL;DR

The paper tackles composition-based materials property prediction and the challenge of relating composition to structure by introducing cross-modal knowledge transfer. It presents two formulations: implicit transfer (imKT), which pretrains chemical language models on multimodal embeddings and aligns them to a multimodal foundation model, and explicit transfer (exKT), which generates crystal structures with CrystaLLM and then applies structure-aware predictors. Across benchmarks such as LLM4Mat-Bench and MatBench, imKT delivers substantial gains—state-of-the-art in 25 of 32 tasks with an average $MAE$ reduction of $15.7\%$—while exKT shows more limited gains, partly due to metastable compounds and CSP limitations; a SHAP-IQ explainability analysis demonstrates informative high-order token interactions between elements and motifs. Overall, the modular cross-modal framework provides a scalable path to improve composition-based predictions and holds potential for further gains via enhanced multimodal representations and CSP methods.

Abstract

Crystal graph neural networks are widely applicable in modeling experimentally synthesized compounds and hypothetical materials with unknown synthesizability. In contrast, structure-agnostic predictive algorithms allow exploring previously inaccessible domains of chemical space. Here we present a universal approach for enhancing composition-based materials property prediction by means of cross-modal knowledge transfer. Two formulations are proposed: implicit transfer involves pretraining chemical language models on multimodal embeddings, whereas explicit transfer suggests generating crystal structures and implementing structure-aware predictors. The proposed approaches were benchmarked on LLM4Mat-Bench and MatBench tasks, achieving state-of-the-art performance in 25 out of 32 cases. In addition, we demonstrated how another modeling aspect of chemical language models - interpretability - benefits from applying a game-theoretic approach, which is able to incorporate high-order feature interactions.

Enhancing composition-based materials property prediction by cross-modal knowledge transfer

TL;DR

The paper tackles composition-based materials property prediction and the challenge of relating composition to structure by introducing cross-modal knowledge transfer. It presents two formulations: implicit transfer (imKT), which pretrains chemical language models on multimodal embeddings and aligns them to a multimodal foundation model, and explicit transfer (exKT), which generates crystal structures with CrystaLLM and then applies structure-aware predictors. Across benchmarks such as LLM4Mat-Bench and MatBench, imKT delivers substantial gains—state-of-the-art in 25 of 32 tasks with an average reduction of —while exKT shows more limited gains, partly due to metastable compounds and CSP limitations; a SHAP-IQ explainability analysis demonstrates informative high-order token interactions between elements and motifs. Overall, the modular cross-modal framework provides a scalable path to improve composition-based predictions and holds potential for further gains via enhanced multimodal representations and CSP methods.

Abstract

Crystal graph neural networks are widely applicable in modeling experimentally synthesized compounds and hypothetical materials with unknown synthesizability. In contrast, structure-agnostic predictive algorithms allow exploring previously inaccessible domains of chemical space. Here we present a universal approach for enhancing composition-based materials property prediction by means of cross-modal knowledge transfer. Two formulations are proposed: implicit transfer involves pretraining chemical language models on multimodal embeddings, whereas explicit transfer suggests generating crystal structures and implementing structure-aware predictors. The proposed approaches were benchmarked on LLM4Mat-Bench and MatBench tasks, achieving state-of-the-art performance in 25 out of 32 cases. In addition, we demonstrated how another modeling aspect of chemical language models - interpretability - benefits from applying a game-theoretic approach, which is able to incorporate high-order feature interactions.

Paper Structure

This paper contains 5 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: A conceptual scheme of cross-modal knowledge transfer formulations. (A) Implicit knowledge transfer pipeline includes two pretraining phases: masked language modeling (on chemical symbols and stoichiometric coefficients) and multitask regression on multimodal embeddings produced within the foundational model (MultiMat). (B) Explicit knowledge transfer pipeline involves crystal structure prediction in a high-throughput manner and materials property prediction using structure-aware model.
  • Figure 2: Explainability analysis of chemical language model predicting shear modulus. (A) Element-wise importance scores computed as averaged SHAPley Interaction Quantification (SHAP-IQ) values. Most influential (B) two- and (C) three-token combinations, according to the averaged SHAP-IQ values. Two crystal structures from the JARVIS-DFT dataset are depicted to outline most common structural prototypes; the corresponding DFT-computed shear moduli are provided as well.