TXL Fusion: A Hybrid Machine Learning Framework Integrating Chemical Heuristics and Large Language Models for Topological Materials Discovery
Arif Ullah, Rajibul Islam, Ghulam Hussain, Zahir Muhammad, Xiaoguang Li, Ming Yang
TL;DR
The paper addresses the challenge of discovering topological materials, where traditional symmetry-based indicators and first-principles calculations are computationally intensive. It introduces TXL Fusion, a hybrid framework that unites a composition-based heuristic $g(M)$, engineered numerical descriptors, and embeddings from a fine-tuned LLM (SciBERT) to classify materials into trivial, topological semimetals (TSMs), and topological insulators (TIs). TXL Fusion demonstrates superior performance over a purely heuristic or purely descriptor-based baseline, with high-accuracy predictions and DFT validation of selected candidates, illustrating robust generalization in unexplored chemical spaces. The approach offers a scalable, interpretable pathway for data-driven discovery of topological and related quantum materials, and the authors plan public release via the Aitomistic Hub to facilitate broader adoption.
Abstract
Topological materials--including insulators (TIs) and semimetals (TSMs)--hold immense promise for quantum technologies, yet their discovery remains constrained by the high computational cost of first-principles calculations and the slow, resource-intensive nature of experimental synthesis. Here, we introduce TXL Fusion, a hybrid machine learning framework that integrates chemical heuristics, engineered physical descriptors, and large language model (LLM) embeddings to accelerate the discovery of topological materials. By incorporating features such as space group symmetry, valence electron configurations, and composition-derived metrics, TXL Fusion classifies materials across trivial, TSM, and TI categories with improved accuracy and generalization compared to conventional approaches. The framework successfully identified new candidates, with representative cases further validated through density functional theory (DFT), confirming its predictive robustness. By uniting data-driven learning with chemical intuition, TXL Fusion enables rapid and interpretable exploration of complex materials spaces, establishing a scalable paradigm for the intelligent discovery of next-generation topological and quantum materials.
