Synergistic Fusion of Multi-Source Knowledge via Evidence Theory for High-Entropy Alloy Discovery
Minh-Quyet Ha, Dinh-Khiet Le, Duc-Anh Dao, Tien-Sinh Vu, Duong-Nguyen Nguyen, Viet-Cuong Nguyen, Hiori Kino, Van-Nam Huynh, Hieu-Chi Dam
TL;DR
This work addresses HEA discovery in an expansive compositional space by fusing data-driven material datasets with domain knowledge from large language models through Dempster-Shafer evidential reasoning on elemental substitutability. The hybrid framework incorporates reliability-aware discounting and analogy-based inference to manage epistemic and aleatoric uncertainty, achieving strong extrapolation performance across four quaternary HEA datasets (AUC 0.92–0.95) and revealing actionable insights into the HEA formation mechanism. Key contributions include a multi-source evidential fusion workflow, interpretability via substitutability clustering and t-SNE visualization, and identification of a core set $$ of 14 transition metals that underpin HEA stability. Collectively, the approach accelerates HEA discovery by enabling robust generalization and explainable design guidance in data-scarce regions.
Abstract
Discovering novel high-entropy alloys (HEAs) with desirable properties is challenging due to the vast compositional space and complex phase formation mechanisms. Efficient exploration of this space requires a strategic approach that integrates heterogeneous knowledge sources. Here, we propose a framework that systematically combines knowledge extracted from computational material datasets with domain knowledge distilled from scientific literature using large language models (LLMs). A central feature of this approach is the explicit consideration of element substitutability, identifying chemically similar elements that can be interchanged to potentially stabilize desired HEAs. Dempster-Shafer theory, a mathematical framework for reasoning under uncertainty, is employed to model and combine substitutabilities based on aggregated evidence from multiple sources. The framework predicts the phase stability of candidate HEA compositions and is systematically evaluated on both quaternary alloy systems, demonstrating superior performance compared to baseline machine learning models and methods reliant on single-source evidence in cross-validation experiments. By leveraging multi-source knowledge, the framework retains robust predictive power even when key elements are absent from the training data, underscoring its potential for knowledge transfer and extrapolation. Furthermore, the enhanced interpretability of the methodology offers insights into the fundamental factors governing HEA formation. Overall, this work provides a promising strategy for accelerating HEA discovery by integrating computational and textual knowledge sources, enabling efficient exploration of vast compositional spaces with improved generalization and interpretability.
