Table of Contents
Fetching ...

Knowledge Graph Question Answering for Materials Science (KGQA4MAT): Developing Natural Language Interface for Metal-Organic Frameworks Knowledge Graph (MOF-KG) Using LLM

Yuan An, Jane Greenberg, Alex Kalinowski, Xintong Zhao, Xiaohua Hu, Fernando J. Uribe-Romo, Kyle Langlois, Jacob Furst, Diego A. Gómez-Gualdrón

TL;DR

KGQA4MAT tackles knowledge graph question answering in materials science by building the MOF-KG and releasing a targeted benchmark of 161 complex questions (with 644 paraphrases) that map NL queries to Cypher queries over Neo4j. The authors investigate ChatGPT-based NL-to-query translation using zero-shot, few-shot, and chain-of-thought prompts, achieving an $F1$-score of up to $0.891$ on KGQA4MAT and up to $0.66$ on QALD-9, demonstrating cross-domain generalizability. MOF-KG aggregates structural data and literature-derived synthesis procedures into a graph with over 1.5 million nodes and 3.7 million edges, bridged by publication DOIs, enabling rich queries about MOFs, their synthesis, and properties. The work provides a publicly available benchmark and a scalable methodology for natural language interfaces to domain-specific knowledge graphs, potentially accelerating materials discovery while highlighting limitations in complex query constructs like UNION and path variability. Overall, KGQA4MAT advances accessible KGQA in materials science and sets the stage for broader, domain-aware NL interfaces for scientific KGs.

Abstract

We present a comprehensive benchmark dataset for Knowledge Graph Question Answering in Materials Science (KGQA4MAT), with a focus on metal-organic frameworks (MOFs). A knowledge graph for metal-organic frameworks (MOF-KG) has been constructed by integrating structured databases and knowledge extracted from the literature. To enhance MOF-KG accessibility for domain experts, we aim to develop a natural language interface for querying the knowledge graph. We have developed a benchmark comprised of 161 complex questions involving comparison, aggregation, and complicated graph structures. Each question is rephrased in three additional variations, resulting in 644 questions and 161 KG queries. To evaluate the benchmark, we have developed a systematic approach for utilizing the LLM, ChatGPT, to translate natural language questions into formal KG queries. We also apply the approach to the well-known QALD-9 dataset, demonstrating ChatGPT's potential in addressing KGQA issues for different platforms and query languages. The benchmark and the proposed approach aim to stimulate further research and development of user-friendly and efficient interfaces for querying domain-specific materials science knowledge graphs, thereby accelerating the discovery of novel materials.

Knowledge Graph Question Answering for Materials Science (KGQA4MAT): Developing Natural Language Interface for Metal-Organic Frameworks Knowledge Graph (MOF-KG) Using LLM

TL;DR

KGQA4MAT tackles knowledge graph question answering in materials science by building the MOF-KG and releasing a targeted benchmark of 161 complex questions (with 644 paraphrases) that map NL queries to Cypher queries over Neo4j. The authors investigate ChatGPT-based NL-to-query translation using zero-shot, few-shot, and chain-of-thought prompts, achieving an -score of up to on KGQA4MAT and up to on QALD-9, demonstrating cross-domain generalizability. MOF-KG aggregates structural data and literature-derived synthesis procedures into a graph with over 1.5 million nodes and 3.7 million edges, bridged by publication DOIs, enabling rich queries about MOFs, their synthesis, and properties. The work provides a publicly available benchmark and a scalable methodology for natural language interfaces to domain-specific knowledge graphs, potentially accelerating materials discovery while highlighting limitations in complex query constructs like UNION and path variability. Overall, KGQA4MAT advances accessible KGQA in materials science and sets the stage for broader, domain-aware NL interfaces for scientific KGs.

Abstract

We present a comprehensive benchmark dataset for Knowledge Graph Question Answering in Materials Science (KGQA4MAT), with a focus on metal-organic frameworks (MOFs). A knowledge graph for metal-organic frameworks (MOF-KG) has been constructed by integrating structured databases and knowledge extracted from the literature. To enhance MOF-KG accessibility for domain experts, we aim to develop a natural language interface for querying the knowledge graph. We have developed a benchmark comprised of 161 complex questions involving comparison, aggregation, and complicated graph structures. Each question is rephrased in three additional variations, resulting in 644 questions and 161 KG queries. To evaluate the benchmark, we have developed a systematic approach for utilizing the LLM, ChatGPT, to translate natural language questions into formal KG queries. We also apply the approach to the well-known QALD-9 dataset, demonstrating ChatGPT's potential in addressing KGQA issues for different platforms and query languages. The benchmark and the proposed approach aim to stimulate further research and development of user-friendly and efficient interfaces for querying domain-specific materials science knowledge graphs, thereby accelerating the discovery of novel materials.
Paper Structure (13 sections, 3 equations, 3 figures, 2 tables)

This paper contains 13 sections, 3 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The Underlying Net and Framework of MOF-3
  • Figure 2: The concepts and relationships in the ontology for the MOF knowledge graph (MOF-KG)
  • Figure 3: The process of generating the KGQA4MAT benchmark