deCIFer: Crystal Structure Prediction from Powder Diffraction Data using Autoregressive Language Models
Frederik Lizak Johansen, Ulrik Friis-Jensen, Erik Bjørnager Dam, Kirsten Marie Ørnsbjerg Jensen, Rocío Mercado, Raghavendra Selvan
TL;DR
The paper tackles crystal-structure prediction from powder diffraction data by introducing deCIFer, an autoregressive transformer that generates CIFs conditioned on PXRD signals. It innovates by directly integrating experimental diffraction into CIF-based structure generation, trained on ~2.3 million CIFs and evaluated on diverse PXRD datasets, including CHILI-100K for out-of-distribution testing. The results show that PXRD conditioning improves structural fidelity to diffraction data and match rates, while highlighting trade-offs with composition priors and challenges for low-symmetry systems. The work provides a scalable, data-informed CSP framework and discusses broader implications, limitations (e.g., homometric degeneracy), and avenues for extending conditioning to multiple data sources and downstream validation.
Abstract
Novel materials drive progress across applications from energy storage to electronics. Automated characterization of material structures with machine learning methods offers a promising strategy for accelerating this key step in material design. In this work, we introduce an autoregressive language model that performs crystal structure prediction (CSP) from powder diffraction data. The presented model, deCIFer, generates crystal structures in the widely used Crystallographic Information File (CIF) format and can be conditioned on powder X-ray diffraction (PXRD) data. Unlike earlier works that primarily rely on high-level descriptors like composition, deCIFer is also able to use diffraction data to perform CSP. We train deCIFer on nearly 2.3M crystal structures and validate on diverse sets of PXRD patterns for characterizing challenging inorganic crystal systems. Qualitative checks and quantitative assessments using the residual weighted profile show that deCIFer produces structures that more accurately match the target diffraction data. Notably, deCIFer can achieve a 94% match rate on test data. deCIFer bridges experimental diffraction data with computational CSP, lending itself as a powerful tool for crystal structure characterization.
