Table of Contents
Fetching ...

Synergistic Fusion of Multi-Source Knowledge via Evidence Theory for High-Entropy Alloy Discovery

Minh-Quyet Ha, Dinh-Khiet Le, Duc-Anh Dao, Tien-Sinh Vu, Duong-Nguyen Nguyen, Viet-Cuong Nguyen, Hiori Kino, Van-Nam Huynh, Hieu-Chi Dam

TL;DR

This work addresses HEA discovery in an expansive compositional space by fusing data-driven material datasets with domain knowledge from large language models through Dempster-Shafer evidential reasoning on elemental substitutability. The hybrid framework incorporates reliability-aware discounting and analogy-based inference to manage epistemic and aleatoric uncertainty, achieving strong extrapolation performance across four quaternary HEA datasets (AUC 0.92–0.95) and revealing actionable insights into the HEA formation mechanism. Key contributions include a multi-source evidential fusion workflow, interpretability via substitutability clustering and t-SNE visualization, and identification of a core set $$ of 14 transition metals that underpin HEA stability. Collectively, the approach accelerates HEA discovery by enabling robust generalization and explainable design guidance in data-scarce regions.

Abstract

Discovering novel high-entropy alloys (HEAs) with desirable properties is challenging due to the vast compositional space and complex phase formation mechanisms. Efficient exploration of this space requires a strategic approach that integrates heterogeneous knowledge sources. Here, we propose a framework that systematically combines knowledge extracted from computational material datasets with domain knowledge distilled from scientific literature using large language models (LLMs). A central feature of this approach is the explicit consideration of element substitutability, identifying chemically similar elements that can be interchanged to potentially stabilize desired HEAs. Dempster-Shafer theory, a mathematical framework for reasoning under uncertainty, is employed to model and combine substitutabilities based on aggregated evidence from multiple sources. The framework predicts the phase stability of candidate HEA compositions and is systematically evaluated on both quaternary alloy systems, demonstrating superior performance compared to baseline machine learning models and methods reliant on single-source evidence in cross-validation experiments. By leveraging multi-source knowledge, the framework retains robust predictive power even when key elements are absent from the training data, underscoring its potential for knowledge transfer and extrapolation. Furthermore, the enhanced interpretability of the methodology offers insights into the fundamental factors governing HEA formation. Overall, this work provides a promising strategy for accelerating HEA discovery by integrating computational and textual knowledge sources, enabling efficient exploration of vast compositional spaces with improved generalization and interpretability.

Synergistic Fusion of Multi-Source Knowledge via Evidence Theory for High-Entropy Alloy Discovery

TL;DR

This work addresses HEA discovery in an expansive compositional space by fusing data-driven material datasets with domain knowledge from large language models through Dempster-Shafer evidential reasoning on elemental substitutability. The hybrid framework incorporates reliability-aware discounting and analogy-based inference to manage epistemic and aleatoric uncertainty, achieving strong extrapolation performance across four quaternary HEA datasets (AUC 0.92–0.95) and revealing actionable insights into the HEA formation mechanism. Key contributions include a multi-source evidential fusion workflow, interpretability via substitutability clustering and t-SNE visualization, and identification of a core set of 14 transition metals that underpin HEA stability. Collectively, the approach accelerates HEA discovery by enabling robust generalization and explainable design guidance in data-scarce regions.

Abstract

Discovering novel high-entropy alloys (HEAs) with desirable properties is challenging due to the vast compositional space and complex phase formation mechanisms. Efficient exploration of this space requires a strategic approach that integrates heterogeneous knowledge sources. Here, we propose a framework that systematically combines knowledge extracted from computational material datasets with domain knowledge distilled from scientific literature using large language models (LLMs). A central feature of this approach is the explicit consideration of element substitutability, identifying chemically similar elements that can be interchanged to potentially stabilize desired HEAs. Dempster-Shafer theory, a mathematical framework for reasoning under uncertainty, is employed to model and combine substitutabilities based on aggregated evidence from multiple sources. The framework predicts the phase stability of candidate HEA compositions and is systematically evaluated on both quaternary alloy systems, demonstrating superior performance compared to baseline machine learning models and methods reliant on single-source evidence in cross-validation experiments. By leveraging multi-source knowledge, the framework retains robust predictive power even when key elements are absent from the training data, underscoring its potential for knowledge transfer and extrapolation. Furthermore, the enhanced interpretability of the methodology offers insights into the fundamental factors governing HEA formation. Overall, this work provides a promising strategy for accelerating HEA discovery by integrating computational and textual knowledge sources, enabling efficient exploration of vast compositional spaces with improved generalization and interpretability.

Paper Structure

This paper contains 12 sections, 11 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: The material space illustrates decision-making scenarios for exploitation and exploration criteria in High-Entropy Alloy (HEA) discovery. Colored regions represent familiar regions of the HEA space where sufficient data is available, while white regions represent novel, unexplored areas where data is sparse or absent.
  • Figure 2: Workflow illustration of the proposed method for evaluating hypothetical candidates forming high-entropy alloy (HEA) phases. (a–b) Schematic outlining the collection of substitutability evidence from a single material dataset (MD) and large language models (LLMs). (c) Schematic for assessing the properties of hypothetical candidates using aggregated evidence derived from substitution-based methods.
  • Figure 3: Evaluation of predictive capability at varying training-set sizes. (a–d) Classification accuracy of the multi-source, single-source, andLR-based models on four quaternary-alloy datasets $\mathcal{D}_{0.9 T_m}$, $\mathcal{D}_{1350K}$, $\mathcal{D}_{\mathrm{Mag}}$, and $\mathcal{D}_{T_C}$. (e–h) Receiver Operating Characteristic (ROC) curves for the same models at a 30% training-set size on these datasets. (i–l) The area under the ROC curves (AUC) for each model across a range of training-set sizes, providing an overall measure of discriminative performance. In all subplots, the red lines indicate the multi-source model (using both MD and LLM sources), the green and blue lines denote single-source models (using either MD or LLM sources), and the gray lines represent the LR-based model.
  • Figure 4: Predictive capability evaluation through extrapolation on four quaternary-alloy datasets. For each dataset, alloys containing a specific element $e$ are excluded from training and used as the test set. (a-d) The area under the ROC curves (AUC) is shown for each model on the respective test sets in the extrapolation experiments. In all subplots, the red lines indicate the multi-source model (integrating both MD and LLM sources), the green and blue lines denote single-source models (using either MD or LLM sources), and the gray lines represent the LR-based model.
  • Figure 5: Circular hierarchical clustering (HAC) of elements based on substitutability between elements.The circular dendrogram displays the hierarchical clustering of all constituent elements, constructed using hierarchical agglomerative clustering (HAC) with the "complete" linkage criterion. The substitutability information is derived from both alloy datasets and LLM-based knowledge. Blue labels represent early transition metals, orange labels indicate late transition metals, and red labels denote coinage metals, including copper (Cu), silver (Ag), and gold (Au).
  • ...and 2 more figures