Table of Contents
Fetching ...

End-to-End Crystal Structure Prediction from Powder X-Ray Diffraction

Qingsi Lai, Fanjie Xu, Lin Yao, Zhifeng Gao, Siyuan Liu, Hongshuai Wang, Shuqi Lu, Di He, Liwei Wang, Cheng Wang, Guolin Ke

TL;DR

XtalNet addresses the challenge of predicting fine-grained crystal structures directly from PXRD patterns, a task traditionally hindered by ambiguity and reliance on external databases. It introduces a two-module framework with a Contrastive PXRD-Crystal Pretraining (CPCP) component to align PXRD space with crystal-structure space and a Conditional Crystal Structure Generation (CCSG) component that performs diffusion-based, PXRD-conditioned structure generation. On MOF datasets with unit cells up to 400 atoms, XtalNet achieves high top-k match rates (e.g., 90.2% in hMOF-100 and 79% in hMOF-400) and robust retrieval performance, demonstrating end-to-end capability to predict structures from PXRD without external databases. This approach has potential to accelerate automated crystal-structure determination and materials discovery, though it relies on simulated PXRD for training and will benefit from enhanced handling of experimental noise and broader applicability to inorganic systems.

Abstract

Powder X-ray diffraction (PXRD) is a prevalent technique in materials characterization. While the analysis of PXRD often requires extensive human manual intervention, and most automated method only achieved at coarse-grained level. The more difficult and important task of fine-grained crystal structure prediction from PXRD remains unaddressed. This study introduces XtalNet, the first equivariant deep generative model for end-to-end crystal structure prediction from PXRD. Unlike previous crystal structure prediction methods that rely solely on composition, XtalNet leverages PXRD as an additional condition, eliminating ambiguity and enabling the generation of complex organic structures with up to 400 atoms in the unit cell. XtalNet comprises two modules: a Contrastive PXRD-Crystal Pretraining (CPCP) module that aligns PXRD space with crystal structure space, and a Conditional Crystal Structure Generation (CCSG) module that generates candidate crystal structures conditioned on PXRD patterns. Evaluation on two MOF datasets (hMOF-100 and hMOF-400) demonstrates XtalNet's effectiveness. XtalNet achieves a top-10 Match Rate of 90.2% and 79% for hMOF-100 and hMOF-400 in conditional crystal structure prediction task, respectively. XtalNet enables the direct prediction of crystal structures from experimental measurements, eliminating the need for manual intervention and external databases. This opens up new possibilities for automated crystal structure determination and the accelerated discovery of novel materials.

End-to-End Crystal Structure Prediction from Powder X-Ray Diffraction

TL;DR

XtalNet addresses the challenge of predicting fine-grained crystal structures directly from PXRD patterns, a task traditionally hindered by ambiguity and reliance on external databases. It introduces a two-module framework with a Contrastive PXRD-Crystal Pretraining (CPCP) component to align PXRD space with crystal-structure space and a Conditional Crystal Structure Generation (CCSG) component that performs diffusion-based, PXRD-conditioned structure generation. On MOF datasets with unit cells up to 400 atoms, XtalNet achieves high top-k match rates (e.g., 90.2% in hMOF-100 and 79% in hMOF-400) and robust retrieval performance, demonstrating end-to-end capability to predict structures from PXRD without external databases. This approach has potential to accelerate automated crystal-structure determination and materials discovery, though it relies on simulated PXRD for training and will benefit from enhanced handling of experimental noise and broader applicability to inorganic systems.

Abstract

Powder X-ray diffraction (PXRD) is a prevalent technique in materials characterization. While the analysis of PXRD often requires extensive human manual intervention, and most automated method only achieved at coarse-grained level. The more difficult and important task of fine-grained crystal structure prediction from PXRD remains unaddressed. This study introduces XtalNet, the first equivariant deep generative model for end-to-end crystal structure prediction from PXRD. Unlike previous crystal structure prediction methods that rely solely on composition, XtalNet leverages PXRD as an additional condition, eliminating ambiguity and enabling the generation of complex organic structures with up to 400 atoms in the unit cell. XtalNet comprises two modules: a Contrastive PXRD-Crystal Pretraining (CPCP) module that aligns PXRD space with crystal structure space, and a Conditional Crystal Structure Generation (CCSG) module that generates candidate crystal structures conditioned on PXRD patterns. Evaluation on two MOF datasets (hMOF-100 and hMOF-400) demonstrates XtalNet's effectiveness. XtalNet achieves a top-10 Match Rate of 90.2% and 79% for hMOF-100 and hMOF-400 in conditional crystal structure prediction task, respectively. XtalNet enables the direct prediction of crystal structures from experimental measurements, eliminating the need for manual intervention and external databases. This opens up new possibilities for automated crystal structure determination and the accelerated discovery of novel materials.
Paper Structure (21 sections, 14 equations, 5 figures)

This paper contains 21 sections, 14 equations, 5 figures.

Figures (5)

  • Figure 1: Overview of XtalNet.a, Framework of the Contrastive PXRD-Crystal Pretraining (CPCP) module. The CPCP module takes PXRD patterns and crystal structures as inputs and produces similarity scores between them. A transformer-based PXRD feature extractor processes the PXRD pattern data, while an equivariant Graph Neural Network (GNN) extracts features from the crystal structure. The similarity score is computed as the dot product of these two feature sets. b, Framework of the Conditional Crystal Structure Generation (CCSG) module. The CCSG module utilizes the PXRD pattern as a condition to generate crystal structures. The PXRD feature extractor is initialized from the CPCP module pretraining and kept frozen. Subsequently, the composition of the crystal are used to initialize the atom positions and lattice matrix. The denoising network, referred to as the crystal structure network, takes the previous crystal structure, along with the PXRD feature obtaind through PXRD feature extractor and time step, as inputs to update the crystal structure. This process is iteratively repeated in a reverse manner. c, PXRD data generation workflow. PXRD data can be acquired in two ways: by simulating PXRD data from a given crystal structure using GSAS gsas software, or by conducting actual PXRD experiments with an XRD instrument. d, Workflow of the PXRD feature extractor. The PXRD data is first tokenized into peak tokens based on peak intensity and the corresponding diffraction angle, after which PXRD features are derived using BERT. e, Framework of the crystal structure network. In the CPCP model, only the solid line components of the process are executed, whereas in the CCSG model, both solid and dashed line components are executed.
  • Figure 2: CPCP Module Performance. a, t-SNE reduction of the hMOF-100 dataset's PXRD feature embeddings, with the unit cell volume represented by color intensity. The clustering of unit cell volume in PXRD feature embedding indicates the effectiveness of the CPCP module in aligning PXRD and crystal structure spaces. b, the top-10 hit rate for the database retrieval task, highlighting the module's efficacy in identifying corresponding crystal structures based on PXRD patterns. c, a heatmap of similarity scores for 50 randomly selected crystal structures from hMOF-100 dataset and their corresponding PXRD patterns. d, a retrieval result for a given PXRD pattern, showcasing the top four retrieved crystal structures and their corresponding PXRD patterns, illustrating the high degree of similarity in metal-connecting structures and PXRD spectra.d
  • Figure 3: Performance of XtalNet in Crystal Structure Prediction. a, the match rates for hMOF-100 and hMOF-400 datasets with different number of top rank generated crystal structure candidates. b,c,the RMSE statistics for hMOF-100 and hMOF-400 dataset, indicating that XtalNet can generate highly accurate crystal structures from PXRD data for a significant proportion of cases. d, visual comparison of generated crystal structures and their simulated PXRD patterns against ground truth, highlighting the model's performance in generating metal-connecting parts of MOFs and maintaining high similarity in PXRD patterns. e, performance of different architectures, demonstraing the reasonableness of XtalNet. Feat. node denotes PXRD feature is added as a new node, Feat. cat denotes PXRD feature is concated with original node features, P denotes PXRD feature extractor is pretrained by CPCP module and F denotes PXRD feature extractor is frozen during CCSG training. f, diffusion trajectory of generating a crystal structure, showing the interpretability of the denoising process.
  • Figure 4: Evaluation of XtalNet in Diverse System Sizes and Elemental Compositions. a, the match rate and structure number corresponding to different system sizes in the training set, demonstrating XtalNet's applicability to systems with varying atom numbers in the unit cell. b, the RMSE of the best generation results for different system sizes in the hMOF400 dataset, revealing the impact of system complexity on prediction accuracy. c and d, the match rates and RMSE for structures containing distinct metal elements, highlighting the influence of sample number and system complexity on model performance.
  • Figure 5: XtalNet Predictions of Real Experimental PXRD Patterns. Two cases of XtalNet's predictions for real experimental PXRD data are drawn, showcasing both the ground truth (GT) crystal structures and the predicted crystal structures. The GT simulated (red), predicted simulated (blue), and experimental (purple) PXRD patterns are also presented for comparison.