Table of Contents
Fetching ...

CRYSIM: Prediction of Symmetric Structures of Large Crystals with GPU-based Ising Machines

Chen Liang, Diptesh Das, Jiang Guo, Ryo Tamura, Zetian Mao, Koji Tsuda

TL;DR

CRYSIM addresses crystal structure prediction by encoding symmetry directly into a symmetry-informed Ising formulation, representing lattice parameters, space-group information, Wyckoff-position combinations, and independent-site coordinates as a binary embedding. It learns a quadratic energy surrogate with a Factorization Machine and minimizes it on a GPU-based Ising machine to propose candidate structures, which are then refined by a neural-network potential (M3GNet). Across benchmarks, CRYSIM matches or surpasses CALYPSO and Bayesian optimization on small crystals and demonstrates strong scalability to large crystals (e.g., Ca$_{24}$Al$_{16}$(SiO$_4$)$_{24}$ and (SiO$_2$)$_{96}$), highlighting its potential for quantum-era CSP and future deployments on quantum annealers. The study also analyzes encoding choices, symmetry-driven processing techniques, and active-learning strategies, offering a practical, end-to-end CSP workflow that leverages symmetry to tackle large search spaces.

Abstract

Solving black-box optimization problems with Ising machines is increasingly common in materials science. However, their application to crystal structure prediction (CSP) is still ineffective due to symmetry agnostic encoding of atomic coordinates. We introduce CRYSIM, an algorithm that encodes the space group, the Wyckoff positions combination, and coordinates of independent atomic sites as separate variables. This encoding reduces the search space substantially by exploiting the symmetry in space groups. When CRYSIM is interfaced to Fixstars Amplify, a GPU-based Ising machine, its prediction performance was competitive with CALYPSO and Bayesian optimization for crystals containing more than 150 atoms in a unit cell. Although it is not realistic to interface CRYSIM to current small-scale quantum devices, it has the potential to become the standard CSP algorithm in the coming quantum age.

CRYSIM: Prediction of Symmetric Structures of Large Crystals with GPU-based Ising Machines

TL;DR

CRYSIM addresses crystal structure prediction by encoding symmetry directly into a symmetry-informed Ising formulation, representing lattice parameters, space-group information, Wyckoff-position combinations, and independent-site coordinates as a binary embedding. It learns a quadratic energy surrogate with a Factorization Machine and minimizes it on a GPU-based Ising machine to propose candidate structures, which are then refined by a neural-network potential (M3GNet). Across benchmarks, CRYSIM matches or surpasses CALYPSO and Bayesian optimization on small crystals and demonstrates strong scalability to large crystals (e.g., CaAl(SiO) and (SiO)), highlighting its potential for quantum-era CSP and future deployments on quantum annealers. The study also analyzes encoding choices, symmetry-driven processing techniques, and active-learning strategies, offering a practical, end-to-end CSP workflow that leverages symmetry to tackle large search spaces.

Abstract

Solving black-box optimization problems with Ising machines is increasingly common in materials science. However, their application to crystal structure prediction (CSP) is still ineffective due to symmetry agnostic encoding of atomic coordinates. We introduce CRYSIM, an algorithm that encodes the space group, the Wyckoff positions combination, and coordinates of independent atomic sites as separate variables. This encoding reduces the search space substantially by exploiting the symmetry in space groups. When CRYSIM is interfaced to Fixstars Amplify, a GPU-based Ising machine, its prediction performance was competitive with CALYPSO and Bayesian optimization for crystals containing more than 150 atoms in a unit cell. Although it is not realistic to interface CRYSIM to current small-scale quantum devices, it has the potential to become the standard CSP algorithm in the coming quantum age.

Paper Structure

This paper contains 25 sections, 29 equations, 11 figures, 14 tables.

Figures (11)

  • Figure 1: The workflow of CRYSIM that contains $T$ iterations, using Si${_4}$O${_8}$ as an illustration. Thin arrows denote the workflow at the $t$-th iteration, and thick arrows denote entering and exiting iterations. a Given the considered material system, a dataset is obtained by RG to provide training samples and determine the upper bound of lattice parameters for binary representation. Potential energy of each material is also estimated by pretrained NNP without structure relaxation. b Structures in the dataset $\{S_1, S_2, \ldots, S_{1000}\}$ are encoded into binary vectors $\{\mathbf{x}_1, \mathbf{x}_2, \ldots, \mathbf{x}_{1000}\}$ using symmetry-informed integer encoding. c FM is used to perform regression from the binary vectors to their corresponding estimated energies, obtaining the objective function to be optimized. d An Ising solver is employed to solve the learned objective function to minimize $y$ in $t$-th iteration, resulting in $\mathbf{x}^{*,t}$. Amplify is used in this work. e The solved binary embeddings $\mathbf{x}^{*,t}$ is decoded into crystal structures. Since one bit in the WPC segment represents a group of 100 WPCs, 100 structures are derived. The one with the largest MID is selected as $S^{*,t}$. We note that the Si$_4$O$_8$ structures drawn in the figure e are indicative, which have different SGs. f The solved structure $S^{*,t}$ is relaxed by NNP, leading to a structure-energy pair $(S^{*,t}_r, E^t_{\min, r})$. If iterations have not finished, frames in the relaxation trajectory are sampled. g Among the sampled structures, if one contains an MID smaller than 0.5 Å but still is estimated to have a negative energy, the energy is reassigned with a high positive one before adding the points into the training dataset for the next iteration. After finishing all iterations, the final structure $S_{r}^{*}$, the one with the lowest relaxed energy among all crystals in all $T$ iterations, will be regarded as the discovered stable structure of this system.
  • Figure 2: The first iteration when the generated structure matches the ground truth ($I_{M, 0}$), and the number of successfully matched structures among the generated ones with the lowest energy ($N_M$) of a-b Ca$_4$S$_4$ and c-d Ba$_3$Na$_3$Bi$_3$ for the three optimization methods in 300 iterations. Shadowed bars in c indicate that the corresponding methods fail to find the ground truth structure with these seeds.
  • Figure 3: Side view (left column for each method) and top view (right column) of ground states in MP of the five benchmark crystals, mp-11277 (Sc: purple, Be: green), mp-1672 (Ca: blue, S: yellow), mp-31235 (Ba: green, Na: yellow, Bi: pink), mp-755253 (Li: green, Zr: blue, O: red), and mp-1211008 (Li: green, Ti: blue, Se: orange, O: red), respectively, and predicted configurations by three CSP methods after structure relaxation, visualized by VESTA software Momma:db5098, with M3GNet-estimated relaxed energies labeled above. Most configurations are expanded into superlattices to display the patterns. Crystals with the lowest energies are selected. If there are more than one crystals having the same energy, the one obtained in the earliest iteration is shown.
  • Figure 4: Averaged accumulated lowest M3GNet-estimated relaxed energies of a Y$_6$Co$_{51}$, b Ca$_{24}$Al$_{16}$(SiO$_4$)$_{24}$ and c (SiO$_2$)$_{96}$ structures derived from various CSP algorithms. Each curve is averaged on five tests with different random seeds, and colored shaded areas cover the maximum and minimum in the five trials. Dash lines are relaxed energies of ground truth materials in MP. d Side view (left column for each method) and top view (right column) of ground states in MP, mp-1106140 (Y: grey, Co: blue), mp-6008 (Ca: grey, Al: light blue, Si: deep blue, O: red), and mp-1200292 (Si: blue, O: red), and representative predicted configurations after structure relaxation, respectively, visualized by VESTA software Momma:db5098, with M3GNet-estimated relaxed energies labeled above.
  • Figure S1: Side view (up row for each method) and top view (down row) of the ground state of Li$_8$Zr$_4$O$_{12}$ in MP (mp-4156, Li in green, Zr in blue, O in red), and predicted configurations by three CSP methods after structure relaxation, visualized by VESTA software Momma:db5098, with M3GNet RN1130-estimated relaxed energies labeled above.
  • ...and 6 more figures