Table of Contents
Fetching ...

Construction and Application of Materials Knowledge Graph in Multidisciplinary Materials Science via Large Language Model

Yanpeng Ye, Jie Ren, Shaozhou Wang, Yuwei Wan, Imran Razzak, Bram Hoex, Haofen Wang, Tong Xie, Wenjie Zhang

TL;DR

The paper presents an LLM-driven NLP pipeline that constructs a materials-focused knowledge graph (MKG) by extracting NER, RE, and ER from unstructured literature into structured triples with provenance via DOIs. It introduces a domain ontology centered on core material labels, builds the graph in Neo4j, and demonstrates a graph-completion framework using Enhanced Jaccard similarity and TransE for link prediction, achieving a 162,605-node, 731,772-edge KG and showing predictive utility with 48.5% validation of material–application links within nine years (from 2014). Darwin-based fine-tuning outperforms other LLMs on NER/RE tasks, while expert-dictionary–assisted ER provides substantial gains over purely LLM-based ER; ablation confirms the value of ER-ED and normalization. The MKG enables efficient querying, provenance tracking, and future integration with other KGs, offering a scalable foundation for accelerated materials discovery and interdisciplinary knowledge integration; future work includes time-aware dynamics, clustering analyses, and broader domain applicability.

Abstract

Knowledge in materials science is widely dispersed across extensive scientific literature, posing significant challenges to the efficient discovery and integration of new materials. Traditional methods, often reliant on costly and time-consuming experimental approaches, further complicate rapid innovation. Addressing these challenges, the integration of artificial intelligence with materials science has opened avenues for accelerating the discovery process, though it also demands precise annotation, data extraction, and traceability of information. To tackle these issues, this article introduces the Materials Knowledge Graph (MKG), which utilizes advanced natural language processing techniques integrated with large language models to extract and systematically organize a decade's worth of high-quality research into structured triples, contains 162,605 nodes and 731,772 edges. MKG categorizes information into comprehensive labels such as Name, Formula, and Application, structured around a meticulously designed ontology, thus enhancing data usability and integration. By implementing network-based algorithms, MKG not only facilitates efficient link prediction but also significantly reduces reliance on traditional experimental methods. This structured approach not only streamlines materials research but also lays the groundwork for more sophisticated science knowledge graphs.

Construction and Application of Materials Knowledge Graph in Multidisciplinary Materials Science via Large Language Model

TL;DR

The paper presents an LLM-driven NLP pipeline that constructs a materials-focused knowledge graph (MKG) by extracting NER, RE, and ER from unstructured literature into structured triples with provenance via DOIs. It introduces a domain ontology centered on core material labels, builds the graph in Neo4j, and demonstrates a graph-completion framework using Enhanced Jaccard similarity and TransE for link prediction, achieving a 162,605-node, 731,772-edge KG and showing predictive utility with 48.5% validation of material–application links within nine years (from 2014). Darwin-based fine-tuning outperforms other LLMs on NER/RE tasks, while expert-dictionary–assisted ER provides substantial gains over purely LLM-based ER; ablation confirms the value of ER-ED and normalization. The MKG enables efficient querying, provenance tracking, and future integration with other KGs, offering a scalable foundation for accelerated materials discovery and interdisciplinary knowledge integration; future work includes time-aware dynamics, clustering analyses, and broader domain applicability.

Abstract

Knowledge in materials science is widely dispersed across extensive scientific literature, posing significant challenges to the efficient discovery and integration of new materials. Traditional methods, often reliant on costly and time-consuming experimental approaches, further complicate rapid innovation. Addressing these challenges, the integration of artificial intelligence with materials science has opened avenues for accelerating the discovery process, though it also demands precise annotation, data extraction, and traceability of information. To tackle these issues, this article introduces the Materials Knowledge Graph (MKG), which utilizes advanced natural language processing techniques integrated with large language models to extract and systematically organize a decade's worth of high-quality research into structured triples, contains 162,605 nodes and 731,772 edges. MKG categorizes information into comprehensive labels such as Name, Formula, and Application, structured around a meticulously designed ontology, thus enhancing data usability and integration. By implementing network-based algorithms, MKG not only facilitates efficient link prediction but also significantly reduces reliance on traditional experimental methods. This structured approach not only streamlines materials research but also lays the groundwork for more sophisticated science knowledge graphs.
Paper Structure (15 sections, 1 equation, 7 figures, 3 tables)

This paper contains 15 sections, 1 equation, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Pipeline of the fine-tuned LLM for knowledge graph tasks.
  • Figure 2: This schematic represents the (a) MKG schema and (b) an example of path in MKG between the "Name" node "Copper Indium Gallium Selenide" and "Application" node "Thin Films".
  • Figure 3: (a)The process of MKG graph completion and (b) the schematic diagram of nodes comparison.
  • Figure 4: Schematic comparison of MKG and MatKG2.
  • Figure 5: (a) Global schematic diagram of MKG; (b) Local schematic diagram of MKG.
  • ...and 2 more figures