Table of Contents
Fetching ...

Bridging the Semantic-Numerical Gap: A Numerical Reasoning Method of Cross-modal Knowledge Graph for Material Property Prediction

Guangxuan Song, Dongmei Fu, Zhongwei Qiu, Zijiang Yang, Jiaxin Dai, Lingwei Ma, Dawei Zhang

TL;DR

This work tackles the semantic-numerical gap in material property prediction by introducing NR-KG, an end-to-end method that builds a cross-modal knowledge graph combining semantic and numerical data. It projects this KG into a canonical space via CMP, supervised by a Projection Prediction Loss and Comparative Learning Loss, and then uses a GCN-based regression to predict material properties. The approach achieves state-of-the-art performance on two High-Entropy Alloy datasets and two public molecular-property datasets, demonstrating strong generalization and data-efficient learning in small-sample regimes. NR-KG offers interpretability through semantic-node visualizations and crystal-structure link completions, and demonstrates scalable performance with modest resource requirements, paving the way for broader application in KG-enabled material informatics.

Abstract

Using machine learning (ML) techniques to predict material properties is a crucial research topic. These properties depend on numerical data and semantic factors. Due to the limitations of small-sample datasets, existing methods typically adopt ML algorithms to regress numerical properties or transfer other pre-trained knowledge graphs (KGs) to the material. However, these methods cannot simultaneously handle semantic and numerical information. In this paper, we propose a numerical reasoning method for material KGs (NR-KG), which constructs a cross-modal KG using semantic nodes and numerical proxy nodes. It captures both types of information by projecting KG into a canonical KG and utilizes a graph neural network to predict material properties. In this process, a novel projection prediction loss is proposed to extract semantic features from numerical information. NR-KG facilitates end-to-end processing of cross-modal data, mining relationships and cross-modal information in small-sample datasets, and fully utilizes valuable experimental data to enhance material prediction. We further propose two new High-Entropy Alloys (HEA) property datasets with semantic descriptions. NR-KG outperforms state-of-the-art (SOTA) methods, achieving relative improvements of 25.9% and 16.1% on two material datasets. Besides, NR-KG surpasses SOTA methods on two public physical chemistry molecular datasets, showing improvements of 22.2% and 54.3%, highlighting its potential application and generalizability. We hope the proposed datasets, algorithms, and pre-trained models can facilitate the communities of KG and AI for materials.

Bridging the Semantic-Numerical Gap: A Numerical Reasoning Method of Cross-modal Knowledge Graph for Material Property Prediction

TL;DR

This work tackles the semantic-numerical gap in material property prediction by introducing NR-KG, an end-to-end method that builds a cross-modal knowledge graph combining semantic and numerical data. It projects this KG into a canonical space via CMP, supervised by a Projection Prediction Loss and Comparative Learning Loss, and then uses a GCN-based regression to predict material properties. The approach achieves state-of-the-art performance on two High-Entropy Alloy datasets and two public molecular-property datasets, demonstrating strong generalization and data-efficient learning in small-sample regimes. NR-KG offers interpretability through semantic-node visualizations and crystal-structure link completions, and demonstrates scalable performance with modest resource requirements, paving the way for broader application in KG-enabled material informatics.

Abstract

Using machine learning (ML) techniques to predict material properties is a crucial research topic. These properties depend on numerical data and semantic factors. Due to the limitations of small-sample datasets, existing methods typically adopt ML algorithms to regress numerical properties or transfer other pre-trained knowledge graphs (KGs) to the material. However, these methods cannot simultaneously handle semantic and numerical information. In this paper, we propose a numerical reasoning method for material KGs (NR-KG), which constructs a cross-modal KG using semantic nodes and numerical proxy nodes. It captures both types of information by projecting KG into a canonical KG and utilizes a graph neural network to predict material properties. In this process, a novel projection prediction loss is proposed to extract semantic features from numerical information. NR-KG facilitates end-to-end processing of cross-modal data, mining relationships and cross-modal information in small-sample datasets, and fully utilizes valuable experimental data to enhance material prediction. We further propose two new High-Entropy Alloys (HEA) property datasets with semantic descriptions. NR-KG outperforms state-of-the-art (SOTA) methods, achieving relative improvements of 25.9% and 16.1% on two material datasets. Besides, NR-KG surpasses SOTA methods on two public physical chemistry molecular datasets, showing improvements of 22.2% and 54.3%, highlighting its potential application and generalizability. We hope the proposed datasets, algorithms, and pre-trained models can facilitate the communities of KG and AI for materials.
Paper Structure (46 sections, 12 equations, 9 figures, 8 tables, 1 algorithm)

This paper contains 46 sections, 12 equations, 9 figures, 8 tables, 1 algorithm.

Figures (9)

  • Figure 1: Comparison of property prediction methods. The methods in (a) ignore material semantic information. The methods in (b) lack integration of numerical and semantic aspects. Our method in (c) utilizes numerical and semantic information while accounting for inter-sample relationships in small-sample data.
  • Figure 2: The NR-KG framework includes (a) KG construction and (b) numerical reasoning process. First, cross-modal KG $\mathcal{G}_{m}$ is constructed with numerical and semantic data. Then, proxy nodes represented by material numerical features $\mathbf{v}_{fm}|e_m$, semantic nodes $e_i$, and edges $r$ in $\mathcal{G}_{m}$ are projected into canonical KG $\bm{\mathcal{G}}_{c}$ using CMP. $\bm{\mathcal{G}}_{c}$ is supervised by CLL and PPL. GNN Regression Prediction facilitates information exchange and predicts material properties $\mathbf{v}'$. $\bigoplus$, Num-Proj. Layer, Sem-Dict, Margin Func, and $fermat(\cdot)$ refer to the join operation, Numerical Projection Layer, Semantic Dictionary, Margin function, and high-dimensional generalized F-point calculation.
  • Figure 3: Cross-modal KG construction and schema layer for HEA and molecular data. (a) and (b) show the process of HEA and molecular data, respectively. In the tables of numerical and semantic data and in the schema layer of the cross-modal KG, the same colors indicate identical categories of information.
  • Figure 4: Visualization of cross-modal KGs. (a), (b), (c) and (d) depict the visualization of cross-modal KGs constructed from the HEA-HD, HEA-CRD, FreeSolv and ESOL datasets, respectively.
  • Figure 5: Semantic edges masking experimental results. (a) and (b) represent experimental results on HEA-HD and HEA-CRD datasets, respectively. As the masking ratio of semantic edges increases, the predictive MSE of NR-KG shows an upward trend, highlighting the significance of semantic information.
  • ...and 4 more figures