Table of Contents
Fetching ...

Enhancing Large Language Models with Domain-Specific Knowledge: The Case in Topological Materials

HuangChao Xu, Baohua Zhang, Zhong Jin, Tiannian Zhu, Quansheng Wu, Hongming Weng

TL;DR

The paper addresses the domain-gap challenge for large language models in condensed-matter/topological materials by building a multi-source data fusion workflow, constructing a materials knowledge graph (MaterialsKG), and deploying a topology-focused dialogue system (TopoChat). It demonstrates retrieval-augmented generation that grounds responses in both structured graph data and relevant literature, yielding improved accuracy in structural queries, property retrieval, and material recommendations. Through KG analysis and user-focused evaluations, the approach reduces hallucinations and enhances domain reliability. The work highlights the potential of knowledge-augmented dialogue systems to accelerate materials science research and outlines concrete paths for expanding data sources and multimodal capabilities.

Abstract

Large language models (LLMs), such as ChatGPT, have demonstrated impressive performance in the text generation task, showing the ability to understand and respond to complex instructions. However, the performance of naive LLMs in speciffc domains is limited due to the scarcity of domain-speciffc corpora and specialized training. Moreover, training a specialized large-scale model necessitates signiffcant hardware resources, which restricts researchers from leveraging such models to drive advances. Hence, it is crucial to further improve and optimize LLMs to meet speciffc domain demands and enhance their scalability. Based on the condensed matter data center, we establish a material knowledge graph (MaterialsKG) and integrate it with literature. Using large language models and prompt learning, we develop a specialized dialogue system for topological materials called TopoChat. Compared to naive LLMs, TopoChat exhibits superior performance in structural and property querying, material recommendation, and complex relational reasoning. This system enables efffcient and precise retrieval of information and facilitates knowledge interaction, thereby encouraging the advancement on the ffeld of condensed matter materials.

Enhancing Large Language Models with Domain-Specific Knowledge: The Case in Topological Materials

TL;DR

The paper addresses the domain-gap challenge for large language models in condensed-matter/topological materials by building a multi-source data fusion workflow, constructing a materials knowledge graph (MaterialsKG), and deploying a topology-focused dialogue system (TopoChat). It demonstrates retrieval-augmented generation that grounds responses in both structured graph data and relevant literature, yielding improved accuracy in structural queries, property retrieval, and material recommendations. Through KG analysis and user-focused evaluations, the approach reduces hallucinations and enhances domain reliability. The work highlights the potential of knowledge-augmented dialogue systems to accelerate materials science research and outlines concrete paths for expanding data sources and multimodal capabilities.

Abstract

Large language models (LLMs), such as ChatGPT, have demonstrated impressive performance in the text generation task, showing the ability to understand and respond to complex instructions. However, the performance of naive LLMs in speciffc domains is limited due to the scarcity of domain-speciffc corpora and specialized training. Moreover, training a specialized large-scale model necessitates signiffcant hardware resources, which restricts researchers from leveraging such models to drive advances. Hence, it is crucial to further improve and optimize LLMs to meet speciffc domain demands and enhance their scalability. Based on the condensed matter data center, we establish a material knowledge graph (MaterialsKG) and integrate it with literature. Using large language models and prompt learning, we develop a specialized dialogue system for topological materials called TopoChat. Compared to naive LLMs, TopoChat exhibits superior performance in structural and property querying, material recommendation, and complex relational reasoning. This system enables efffcient and precise retrieval of information and facilitates knowledge interaction, thereby encouraging the advancement on the ffeld of condensed matter materials.
Paper Structure (12 sections, 1 equation, 6 figures, 4 tables)

This paper contains 12 sections, 1 equation, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Multi-Source Data Management Flow
  • Figure 2: The Components of Materials Knowledge Graph
  • Figure 3: The fission strategy of extracting QA pairs from the literature
  • Figure 4: Two-Phrase Prompt Learning Algorithm Framework
  • Figure 5: Element distribution of five topological classes with height calculated by equal \ref{['equal_1']} in the 3D periodic table(height scale: 1 unit = 10 lbs), where a higher value indicates that the element is more commonly used in topological materials
  • ...and 1 more figures