Table of Contents
Fetching ...

deCIFer: Crystal Structure Prediction from Powder Diffraction Data using Autoregressive Language Models

Frederik Lizak Johansen, Ulrik Friis-Jensen, Erik Bjørnager Dam, Kirsten Marie Ørnsbjerg Jensen, Rocío Mercado, Raghavendra Selvan

TL;DR

The paper tackles crystal-structure prediction from powder diffraction data by introducing deCIFer, an autoregressive transformer that generates CIFs conditioned on PXRD signals. It innovates by directly integrating experimental diffraction into CIF-based structure generation, trained on ~2.3 million CIFs and evaluated on diverse PXRD datasets, including CHILI-100K for out-of-distribution testing. The results show that PXRD conditioning improves structural fidelity to diffraction data and match rates, while highlighting trade-offs with composition priors and challenges for low-symmetry systems. The work provides a scalable, data-informed CSP framework and discusses broader implications, limitations (e.g., homometric degeneracy), and avenues for extending conditioning to multiple data sources and downstream validation.

Abstract

Novel materials drive progress across applications from energy storage to electronics. Automated characterization of material structures with machine learning methods offers a promising strategy for accelerating this key step in material design. In this work, we introduce an autoregressive language model that performs crystal structure prediction (CSP) from powder diffraction data. The presented model, deCIFer, generates crystal structures in the widely used Crystallographic Information File (CIF) format and can be conditioned on powder X-ray diffraction (PXRD) data. Unlike earlier works that primarily rely on high-level descriptors like composition, deCIFer is also able to use diffraction data to perform CSP. We train deCIFer on nearly 2.3M crystal structures and validate on diverse sets of PXRD patterns for characterizing challenging inorganic crystal systems. Qualitative checks and quantitative assessments using the residual weighted profile show that deCIFer produces structures that more accurately match the target diffraction data. Notably, deCIFer can achieve a 94% match rate on test data. deCIFer bridges experimental diffraction data with computational CSP, lending itself as a powerful tool for crystal structure characterization.

deCIFer: Crystal Structure Prediction from Powder Diffraction Data using Autoregressive Language Models

TL;DR

The paper tackles crystal-structure prediction from powder diffraction data by introducing deCIFer, an autoregressive transformer that generates CIFs conditioned on PXRD signals. It innovates by directly integrating experimental diffraction into CIF-based structure generation, trained on ~2.3 million CIFs and evaluated on diverse PXRD datasets, including CHILI-100K for out-of-distribution testing. The results show that PXRD conditioning improves structural fidelity to diffraction data and match rates, while highlighting trade-offs with composition priors and challenges for low-symmetry systems. The work provides a scalable, data-informed CSP framework and discusses broader implications, limitations (e.g., homometric degeneracy), and avenues for extending conditioning to multiple data sources and downstream validation.

Abstract

Novel materials drive progress across applications from energy storage to electronics. Automated characterization of material structures with machine learning methods offers a promising strategy for accelerating this key step in material design. In this work, we introduce an autoregressive language model that performs crystal structure prediction (CSP) from powder diffraction data. The presented model, deCIFer, generates crystal structures in the widely used Crystallographic Information File (CIF) format and can be conditioned on powder X-ray diffraction (PXRD) data. Unlike earlier works that primarily rely on high-level descriptors like composition, deCIFer is also able to use diffraction data to perform CSP. We train deCIFer on nearly 2.3M crystal structures and validate on diverse sets of PXRD patterns for characterizing challenging inorganic crystal systems. Qualitative checks and quantitative assessments using the residual weighted profile show that deCIFer produces structures that more accurately match the target diffraction data. Notably, deCIFer can achieve a 94% match rate on test data. deCIFer bridges experimental diffraction data with computational CSP, lending itself as a powerful tool for crystal structure characterization.

Paper Structure

This paper contains 44 sections, 7 equations, 18 figures, 6 tables.

Figures (18)

  • Figure 1: Overview of the deCIFer model.
  • Figure 2: Example generations using deCIFer.
  • Figure 3: (a) Overview of the deCIFer model, which performs autoregressive crystal structure prediction (CSP) from PXRD data, optionally guided by tokenized crystal descriptors. PXRD embeddings are prepended to the CIF token sequence, enabling the generation of structurally consistent CIFs directly from diffraction data. (b) Three examples from the NOMA test set showing deCIFer generations, each illustrating a reference structure, the generated structure and their corresponding PXRD profiles.
  • Figure 4: Evaluation pipeline: A test set CIF generates a PXRD profile, tokenized for deCIFer to produce a new CIF, compared to the reference using a clean transformation.
  • Figure 5: Left: Distribution of $R_{\mathrm{wp}}$ for deCIFer and U-deCIFer on the NOMA test set with boxplots. Lower $R_{\mathrm{wp}}$ indicates better CIF alignment. Right: Performance for 20K NOMA test samples using deCIFer and U-deCIFer with different descriptors: none (no descriptors), comp. (composition), and comp.+ s.g. (composition + space group). Metrics include validity (Val.) and match rate (MR).
  • ...and 13 more figures