Table of Contents
Fetching ...

TXL Fusion: A Hybrid Machine Learning Framework Integrating Chemical Heuristics and Large Language Models for Topological Materials Discovery

Arif Ullah, Rajibul Islam, Ghulam Hussain, Zahir Muhammad, Xiaoguang Li, Ming Yang

TL;DR

The paper addresses the challenge of discovering topological materials, where traditional symmetry-based indicators and first-principles calculations are computationally intensive. It introduces TXL Fusion, a hybrid framework that unites a composition-based heuristic $g(M)$, engineered numerical descriptors, and embeddings from a fine-tuned LLM (SciBERT) to classify materials into trivial, topological semimetals (TSMs), and topological insulators (TIs). TXL Fusion demonstrates superior performance over a purely heuristic or purely descriptor-based baseline, with high-accuracy predictions and DFT validation of selected candidates, illustrating robust generalization in unexplored chemical spaces. The approach offers a scalable, interpretable pathway for data-driven discovery of topological and related quantum materials, and the authors plan public release via the Aitomistic Hub to facilitate broader adoption.

Abstract

Topological materials--including insulators (TIs) and semimetals (TSMs)--hold immense promise for quantum technologies, yet their discovery remains constrained by the high computational cost of first-principles calculations and the slow, resource-intensive nature of experimental synthesis. Here, we introduce TXL Fusion, a hybrid machine learning framework that integrates chemical heuristics, engineered physical descriptors, and large language model (LLM) embeddings to accelerate the discovery of topological materials. By incorporating features such as space group symmetry, valence electron configurations, and composition-derived metrics, TXL Fusion classifies materials across trivial, TSM, and TI categories with improved accuracy and generalization compared to conventional approaches. The framework successfully identified new candidates, with representative cases further validated through density functional theory (DFT), confirming its predictive robustness. By uniting data-driven learning with chemical intuition, TXL Fusion enables rapid and interpretable exploration of complex materials spaces, establishing a scalable paradigm for the intelligent discovery of next-generation topological and quantum materials.

TXL Fusion: A Hybrid Machine Learning Framework Integrating Chemical Heuristics and Large Language Models for Topological Materials Discovery

TL;DR

The paper addresses the challenge of discovering topological materials, where traditional symmetry-based indicators and first-principles calculations are computationally intensive. It introduces TXL Fusion, a hybrid framework that unites a composition-based heuristic , engineered numerical descriptors, and embeddings from a fine-tuned LLM (SciBERT) to classify materials into trivial, topological semimetals (TSMs), and topological insulators (TIs). TXL Fusion demonstrates superior performance over a purely heuristic or purely descriptor-based baseline, with high-accuracy predictions and DFT validation of selected candidates, illustrating robust generalization in unexplored chemical spaces. The approach offers a scalable, interpretable pathway for data-driven discovery of topological and related quantum materials, and the authors plan public release via the Aitomistic Hub to facilitate broader adoption.

Abstract

Topological materials--including insulators (TIs) and semimetals (TSMs)--hold immense promise for quantum technologies, yet their discovery remains constrained by the high computational cost of first-principles calculations and the slow, resource-intensive nature of experimental synthesis. Here, we introduce TXL Fusion, a hybrid machine learning framework that integrates chemical heuristics, engineered physical descriptors, and large language model (LLM) embeddings to accelerate the discovery of topological materials. By incorporating features such as space group symmetry, valence electron configurations, and composition-derived metrics, TXL Fusion classifies materials across trivial, TSM, and TI categories with improved accuracy and generalization compared to conventional approaches. The framework successfully identified new candidates, with representative cases further validated through density functional theory (DFT), confirming its predictive robustness. By uniting data-driven learning with chemical intuition, TXL Fusion enables rapid and interpretable exploration of complex materials spaces, establishing a scalable paradigm for the intelligent discovery of next-generation topological and quantum materials.

Paper Structure

This paper contains 5 sections, 1 equation, 4 figures, 1 table.

Figures (4)

  • Figure 1: Schematic flowchart of the TXL Fusion model, outlining the main stages of the workflow.
  • Figure 2: Feature importance in material classification for (A) the numerical descriptor-based XGB model and (B) the TXL Fusion model. Here Bert_n ($n=1, \dots, 5$) denotes the five principal components derived from principal component analysis (PCA) of the 768-dimensional embeddings obtained from the fine-tuned LLM.
  • Figure 3: Electronic band structures along with space groups (A) CsC$_8$ (191), (B) OTi$_6$ (159), (C) SbO$_2$ (33) and (D) P$_3$Sc$_7$ (186).
  • Figure :