Table of Contents
Fetching ...

Active Learning for Conditional Inverse Design with Crystal Generation and Foundation Atomic Models

Zhuoyuan Li, Siyu Liu, Beilin Ye, David J. Srolovitz, Tongqi Wen

TL;DR

This work presents an active learning framework that merges conditional crystal generation with foundation atomic models to enable efficient inverse crystal design. Using Con-CDVAE for structure generation and a MACE-MP-0 FAM for high-throughput screening, the authors implement a three-stage pipeline (GNN, MD with FAM, and DFT) to iteratively refine candidates toward targeted bulk modulus values, exemplified by $K_{\text{vrh}}=350\ \mathrm{GPa}$. The approach yields improved generation accuracy (MAPE dropping to $\approx 0.14$) and discovers DFT-validated high-modulus alloys, demonstrating the ability to overcome data sparsity and expand exploration of high-stiffness regions. The framework is model-agnostic and scalable, offering a path toward autonomous, AI-driven materials discovery through integration of generative crystal design with atomic-scale simulations. Overall, it advances inverse materials design by tightly coupling generation, screening, and feedback to accelerate discovery in complex chemical spaces.

Abstract

Artificial intelligence (AI) is transforming materials science, enabling both theoretical advancements and accelerated materials discovery. Recent progress in crystal generation models, which design crystal structures for targeted properties, and foundation atomic models (FAMs), which capture interatomic interactions across the periodic table, has significantly improved inverse materials design. However, an efficient integration of these two approaches remains an open challenge. Here, we present an active learning framework that combines crystal generation models and foundation atomic models to enhance the accuracy and efficiency of inverse design. As a case study, we employ Con-CDVAE to generate candidate crystal structures and MACE-MP-0 FAM as one of the high-throughput screeners for bulk modulus evaluation. Through iterative active learning, we demonstrate that Con-CDVAE progressively improves its accuracy in generating crystals with target properties, highlighting the effectiveness of a property-driven fine-tuning process. Our framework is general to accommodate different crystal generation and foundation atomic models, and establishes a scalable approach for AI-driven materials discovery. By bridging generative modeling with atomic-scale simulations, this work paves the way for more accurate and efficient inverse materials design.

Active Learning for Conditional Inverse Design with Crystal Generation and Foundation Atomic Models

TL;DR

This work presents an active learning framework that merges conditional crystal generation with foundation atomic models to enable efficient inverse crystal design. Using Con-CDVAE for structure generation and a MACE-MP-0 FAM for high-throughput screening, the authors implement a three-stage pipeline (GNN, MD with FAM, and DFT) to iteratively refine candidates toward targeted bulk modulus values, exemplified by . The approach yields improved generation accuracy (MAPE dropping to ) and discovers DFT-validated high-modulus alloys, demonstrating the ability to overcome data sparsity and expand exploration of high-stiffness regions. The framework is model-agnostic and scalable, offering a path toward autonomous, AI-driven materials discovery through integration of generative crystal design with atomic-scale simulations. Overall, it advances inverse materials design by tightly coupling generation, screening, and feedback to accelerate discovery in complex chemical spaces.

Abstract

Artificial intelligence (AI) is transforming materials science, enabling both theoretical advancements and accelerated materials discovery. Recent progress in crystal generation models, which design crystal structures for targeted properties, and foundation atomic models (FAMs), which capture interatomic interactions across the periodic table, has significantly improved inverse materials design. However, an efficient integration of these two approaches remains an open challenge. Here, we present an active learning framework that combines crystal generation models and foundation atomic models to enhance the accuracy and efficiency of inverse design. As a case study, we employ Con-CDVAE to generate candidate crystal structures and MACE-MP-0 FAM as one of the high-throughput screeners for bulk modulus evaluation. Through iterative active learning, we demonstrate that Con-CDVAE progressively improves its accuracy in generating crystals with target properties, highlighting the effectiveness of a property-driven fine-tuning process. Our framework is general to accommodate different crystal generation and foundation atomic models, and establishes a scalable approach for AI-driven materials discovery. By bridging generative modeling with atomic-scale simulations, this work paves the way for more accurate and efficient inverse materials design.

Paper Structure

This paper contains 9 sections, 1 equation, 4 figures.

Figures (4)

  • Figure 1: Active learning framework for inverse materials design. The active learning loop integrates a conditional crystal generation model with a three-stage screening process to iteratively refine crystal candidates. The screening stages include: (I) a graph neural network (GNN) for rapid property prediction, (II) molecular dynamics (MD) simulations using foundation atomic models, and (III) density functional theory (DFT) calculations for high-accuracy validation. MD and DFT simulations are performed using the Automatic Property Explorer (APEX) apex_li2024extendable. Validated structures and their computed properties are incorporated into the training dataset, progressively enhancing model accuracy and expanding the design space.
  • Figure 2: Distribution of the initial training dataset. (a) Elemental occurrence frequency in the training dataset, illustrating the distribution of metal elements used for model training. (b) Histogram of bulk modulus $K_{\text{vrh}}$ values computed using the Voigt-Reuss-Hill (VRH) approximation, revealing a Poisson-like distribution with a peak between 40 and 80 GPa. (c) Distribution of crystal structures containing different elements with $K_{\text{vrh}}>$300 GPa, highlighting the prevalence of high-bulk-modulus materials, with Ir (Iridium) and Re (Rhenium) appearing most frequently.
  • Figure 3: Iterative improvement in bulk modulus prediction for generated alloy structures. (a–c) Bulk modulus predictions using CGCNN for generated structures across three iterations: (a) Iteration 0 (baseline performance), (b) Iteration 1 (first refinement), and (c) Iteration 2 (further optimization). Each horizontal line represents a target bulk modulus $K_{\text{vrh}}$, with corresponding CGCNN predictions shown along each line. (d) Evolution of the mean absolute percentage error (MAPE) across four models for different $K_{\text{vrh}}$ conditions, highlighting progressive accuracy improvements through the active learning process.
  • Figure 4: Newly identified alloy structures with high bulk modulus through active learning. (a) T-distributed Stochastic Neighbor Embedding (T-SNE) visualization of the latent space after two iterations, with newly generated high-bulk-modulus structures highlighted. (b) Distribution of DFT-calculated bulk modulus ($K_{\text{vrh}}$) for the newly generated structures across iterations, showing a shift toward the target value. (c) Four representative alloy crystals with DFT-validated bulk moduli near 350 GPa, demonstrating the effectiveness of the framework in discovering high-stiffness materials.