Table of Contents
Fetching ...

Siamese Foundation Models for Crystal Structure Prediction

Liming Wu, Wenbing Huang, Rui Jiao, Jianxing Huang, Liwei Liu, Yipeng Zhou, Hao Sun, Yang Liu, Fuchun Sun, Yuxiang Ren, Jirong Wen

TL;DR

This work introduces DAO, a Siamese foundation-model framework for Crystal Structure Prediction, pairing a diffusion-based structure generator (DAO-G) with an energy predictor (DAO-P) built on the Crysformer graph Transformer to enforce $O(3)$ and periodic invariance. A large crystal pretraining dataset, CrysDB (~940k entries), enables two-stage pretraining for DAO-G with dataset relaxation guided by DAO-P and energy-guided sampling to promote stability. DAO-P provides both diffusion-compatible energy predictions and broad property-prediction capability across eight datasets, achieving state-of-the-art results and enabling accurate $T_c$ estimates for superconductors when augmented with generated structures. Empirically, DAO-G achieves SOTA CSP performance on MP-20 and MPTS-52, demonstrates robust polymorph generation, and, together with DAO-P, attains strong performance on superconductors and diverse material-property tasks, suggesting a scalable pathway for co-designing generative and predictive models in materials science.

Abstract

Crystal Structure Prediction (CSP), which aims to generate stable crystal structures from compositions, represents a critical pathway for discovering novel materials. While structure prediction tasks in other domains, such as proteins, have seen remarkable progress, CSP remains a relatively underexplored area due to the more complex geometries inherent in crystal structures. In this paper, we propose Siamese foundation models specifically designed to address CSP. Our pretrain-finetune framework, named DAO, comprises two complementary foundation models: DAO-G for structure generation and DAO-P for energy prediction. Experiments on CSP benchmarks (MP-20 and MPTS-52) demonstrate that our DAO-G significantly surpasses state-of-the-art (SOTA) methods across all metrics. Extensive ablation studies further confirm that DAO-G excels in generating diverse polymorphic structures, and the dataset relaxation and energy guidance provided by DAO-P are essential for enhancing DAO-G's performance. When applied to three real-world superconductors ($\text{CsV}_3\text{Sb}_5$, $ \text{Zr}_{16}\text{Rh}_8\text{O}_4$ and $\text{Zr}_{16}\text{Pd}_8\text{O}_4$) that are known to be challenging to analyze, our foundation models achieve accurate critical temperature predictions and structure generations. For instance, on $\text{CsV}_3\text{Sb}_5$, DAO-G generates a structure close to the experimental one with an RMSE of 0.0085; DAO-P predicts the $T_c$ value with high accuracy (2.26 K vs. the ground-truth value of 2.30 K). In contrast, conventional DFT calculators like Quantum Espresso only successfully derive the structure of the first superconductor within an acceptable time, while the RMSE is nearly 8 times larger, and the computation speed is more than 1000 times slower. These compelling results collectively highlight the potential of our approach for advancing materials science research and development.

Siamese Foundation Models for Crystal Structure Prediction

TL;DR

This work introduces DAO, a Siamese foundation-model framework for Crystal Structure Prediction, pairing a diffusion-based structure generator (DAO-G) with an energy predictor (DAO-P) built on the Crysformer graph Transformer to enforce and periodic invariance. A large crystal pretraining dataset, CrysDB (~940k entries), enables two-stage pretraining for DAO-G with dataset relaxation guided by DAO-P and energy-guided sampling to promote stability. DAO-P provides both diffusion-compatible energy predictions and broad property-prediction capability across eight datasets, achieving state-of-the-art results and enabling accurate estimates for superconductors when augmented with generated structures. Empirically, DAO-G achieves SOTA CSP performance on MP-20 and MPTS-52, demonstrates robust polymorph generation, and, together with DAO-P, attains strong performance on superconductors and diverse material-property tasks, suggesting a scalable pathway for co-designing generative and predictive models in materials science.

Abstract

Crystal Structure Prediction (CSP), which aims to generate stable crystal structures from compositions, represents a critical pathway for discovering novel materials. While structure prediction tasks in other domains, such as proteins, have seen remarkable progress, CSP remains a relatively underexplored area due to the more complex geometries inherent in crystal structures. In this paper, we propose Siamese foundation models specifically designed to address CSP. Our pretrain-finetune framework, named DAO, comprises two complementary foundation models: DAO-G for structure generation and DAO-P for energy prediction. Experiments on CSP benchmarks (MP-20 and MPTS-52) demonstrate that our DAO-G significantly surpasses state-of-the-art (SOTA) methods across all metrics. Extensive ablation studies further confirm that DAO-G excels in generating diverse polymorphic structures, and the dataset relaxation and energy guidance provided by DAO-P are essential for enhancing DAO-G's performance. When applied to three real-world superconductors (, and ) that are known to be challenging to analyze, our foundation models achieve accurate critical temperature predictions and structure generations. For instance, on , DAO-G generates a structure close to the experimental one with an RMSE of 0.0085; DAO-P predicts the value with high accuracy (2.26 K vs. the ground-truth value of 2.30 K). In contrast, conventional DFT calculators like Quantum Espresso only successfully derive the structure of the first superconductor within an acceptable time, while the RMSE is nearly 8 times larger, and the computation speed is more than 1000 times slower. These compelling results collectively highlight the potential of our approach for advancing materials science research and development.

Paper Structure

This paper contains 55 sections, 1 theorem, 13 equations, 8 figures, 9 tables, 2 algorithms.

Key Result

Proposition 1

Given ${\mathcal{E}}_t({\mathcal{M}}_t)=-\log \mathbb {E}_{q_{0t}({\mathcal{M}}_0|{\mathcal{M}}_t)} [e^{-\beta{\mathcal{E}}_0({\mathcal{M}}_0)}]$ representing the true energy lu2023cep, we aim to learn a parameterized function $f_\phi({\mathcal{M}}_t, t)$ to approximate ${\mathcal{E}}_t({\mathcal{M}

Figures (8)

  • Figure 1: A summary of our models: (a) offers an overview of the structure generator (DAO-G) and the energy predictor (DAO-P). (a.1) outlines the pretrain-finetune framework. DAO-G conducts a two-stage pretraining process on CrysDB and DAO-P is pretrained on the same dataset. DAO-P enhances DAO-G by dataset relaxation and energy guidance. (a.2) illustrates the pretraining of DAO-P, which involves the diffusion-based CSP loss to estimate the lattice and fractional noises, and the exponential energy loss aiming at recovering the intermediate energy at each timestep along the diffusion trajectory. (a.3) depicts the two-stage pretraining pipeline of DAO-G. In Stage I, DAO-G is pretrained using the diffusion-based CSP loss on the original dataset. Then, DAO-P is employed to relax unstable structures. In Stage II, DAO-G is continually pretrained on the relaxed dataset. (b) describes the overall architecture and each key component of Crysformer. In DAO-G, only the noise head is utilized, whereas DAO-P incorporates both the noise and energy heads.
  • Figure 2: Statistics of the pretraining dataset CrysDB: (a) shows the global analyses of the dataset, including the number of entries from MP and OQMD, the statistics of the deduplicated version, and the propotion of stable structures. (b) reports the distributions of Ehull, volume and atom number. (c) presents the elements coverage. It is important to note that the statistics presented in (b) and (c) refer to the deduplicated version of CrysDB used for pretraining DAO-G.
  • Figure 3: In-depth analyses of our models on the CSP benchmarks: (a) compares the performance of DAO-G across various configurations. Here, "stage I, Stable" refers to pretraining on the stable-only subset of the deduplicated CrysDB, while "stage I" denotes the first-stage pretraining on the full deduplicated CrysDB. (b) gathers the polymorphs (with 2 to 4 conformations) from MP-20, and subsequently compares the generated structures by DAO-G (#samples = 20) with the corresponding ground-truth structures. For clarity, the main abbreviations used in this way: Comp. = Composition, GT = Ground Truth, Gen. = Generation. We provide an example to help understand the bottom annotations: Test[931] denotes the 931st test entry, while Gen[5606][5] represents the fifth of 20 generated samples based on the 5606th entry. (c) summarizes energy reduction after conducting relaxation on CrysDB. The term $\Delta\text{E}$ is calculated using normalized energy. (d) presents MAEs of energy predictions by DAO-P on the test set of MP-20 and MPTS-52. (e) and (f) quantifies test RMSE and stability rate between the models with and without energy guidance. (g) visualizes two examples from MPTS-52, showing the benefits of generation with energy guidance. N/A denotes the failed match.
  • Figure 4: The performance of DAO-P for crystal property prediction is evaluated on eight datasets. The compared baselines include models both with and without pretraining, with the results directly taken from their respective papers. For baselines where the corresponding experiments were not conducted in the original paper, the results are denoted as N/A.
  • Figure 5: Experiments on superconductors: (a) depicts the finetuning process of DAO-P and DAO-G on the SuperCon3D dataset chen2025supercon3d, in which 3D structures are known for a subset of materials. (b) presents the distributions of the critical temperature ($T_c$). (c) displays the $T_c$ prediction error evaluated with the 5-fold cross-validation setting. (d) shows the results of the three recently discovered real-world superconductors, all of which are excluded from the pretraining and finetuning processes. We employ DAO-P (aug.) for $T_c$ prediction and DAO-G for structure generation, subsequently analyzing how the RMSE evolves as the sample size grows. The hollow point (#sample = 1) in the last curve figure for the third superconductor signifies the failed match.
  • ...and 3 more figures

Theorems & Definitions (1)

  • Proposition 1: Intermediate Energy Prediction