Table of Contents
Fetching ...

Crystal-GFN: sampling crystals with desirable properties and constraints

Mila AI4Science, Alex Hernandez-Garcia, Alexandre Duval, Alexandra Volokhova, Yoshua Bengio, Divya Sharma, Pierre Luc Carrier, Yasmine Benabed, Michał Koziarski, Victor Schmidt

TL;DR

Addressing the challenge of discovering stable inorganic crystals, the paper presents Crystal-GFN, a generative model that samples space group, composition, and lattice parameters sequentially under hard domain constraints. It uses a proxy formation energy model trained on MatBench as the reward to train a GFlowNet, enabling diverse sampling of low-energy crystals. In experiments, Crystal-GFN generated 10k crystals with a median predicted FE of -3.1 eV/atom and 95% below -2, while covering a broad range of space groups, lattice systems, and elements. This approach demonstrates how domain-informed representations and flexible reward functions can accelerate materials discovery while maintaining structural validity.

Abstract

Accelerating material discovery holds the potential to greatly help mitigate the climate crisis. Discovering new solid-state materials such as electrocatalysts, super-ionic conductors or photovoltaic materials can have a crucial impact, for instance, in improving the efficiency of renewable energy production and storage. In this paper, we introduce Crystal-GFN, a generative model of crystal structures that sequentially samples structural properties of crystalline materials, namely the space group, composition and lattice parameters. This domain-inspired approach enables the flexible incorporation of physical and structural hard constraints, as well as the use of any available predictive model of a desired physicochemical property as an objective function. To design stable materials, one must target the candidates with the lowest formation energy. Here, we use as objective the formation energy per atom of a crystal structure predicted by a new proxy machine learning model trained on MatBench. The results demonstrate that Crystal-GFN is able to sample highly diverse crystals with low (median -3.1 eV/atom) predicted formation energy.

Crystal-GFN: sampling crystals with desirable properties and constraints

TL;DR

Addressing the challenge of discovering stable inorganic crystals, the paper presents Crystal-GFN, a generative model that samples space group, composition, and lattice parameters sequentially under hard domain constraints. It uses a proxy formation energy model trained on MatBench as the reward to train a GFlowNet, enabling diverse sampling of low-energy crystals. In experiments, Crystal-GFN generated 10k crystals with a median predicted FE of -3.1 eV/atom and 95% below -2, while covering a broad range of space groups, lattice systems, and elements. This approach demonstrates how domain-informed representations and flexible reward functions can accelerate materials discovery while maintaining structural validity.

Abstract

Accelerating material discovery holds the potential to greatly help mitigate the climate crisis. Discovering new solid-state materials such as electrocatalysts, super-ionic conductors or photovoltaic materials can have a crucial impact, for instance, in improving the efficiency of renewable energy production and storage. In this paper, we introduce Crystal-GFN, a generative model of crystal structures that sequentially samples structural properties of crystalline materials, namely the space group, composition and lattice parameters. This domain-inspired approach enables the flexible incorporation of physical and structural hard constraints, as well as the use of any available predictive model of a desired physicochemical property as an objective function. To design stable materials, one must target the candidates with the lowest formation energy. Here, we use as objective the formation energy per atom of a crystal structure predicted by a new proxy machine learning model trained on MatBench. The results demonstrate that Crystal-GFN is able to sample highly diverse crystals with low (median -3.1 eV/atom) predicted formation energy.
Paper Structure (33 sections, 10 figures, 1 table)

This paper contains 33 sections, 10 figures, 1 table.

Figures (10)

  • Figure 1: A schematic of the crystal generation process of Crystal-GFN. First, the space group is selected, in turn decomposed into the selection of a crystal-lattice system and a point symmetry. Then, a composition is generated by iteratively selecting an element type and its quantity. Finally, the lattice parameters of the unit cell are sampled. We introduce hard constraints, denoted by $C_i$, within and between these components, as described in \ref{['sec:methods']}.
  • Figure 2: The seven lattice systems and the constraints they impose on the lattice parameters. In order a--g: triclinic, monoclinic, orthorhombic, tetragonal, rhombohedral, hexagonal and cubic. Source of the figures: Wikimedia Commons, licensed under the https://creativecommons.org/licenses/by-sa/3.0/deed.en license.
  • Figure 3: Distributions of the formation energy predicted by our proxy model in three relevant distributions of samples: in blue, samples from Crystal-GFN after training; in orange, the validation set, representative of the MatBench database; in pink, samples from an untrained Crystal-GFN. As a main conclusion, we observe that Crystal-GFN, after training, manages to sample crystals with predicted formation energies in the range of the validation set or lower.
  • Figure 4: Proxy MLP performance on the validation set and data split FE distributions. The average MAE is $0.10~\text{eV/atom}$. We can also see the effect of the stratification algorithm which yields similar FE distributions between the train and validation data set splits.
  • Figure 5: Distribution of FE values in the validation set and associated Proxy MLP MAE, with $25~\%, 50~\%$ and $75~\%$ MAE quantiles.
  • ...and 5 more figures