Table of Contents
Fetching ...

Bridging the Synthesizability Gap in Perovskites by Combining Computations, Literature Data, and PU Learning

Rushik Desai, Junyeong Ahn, Alejandro Strachan, Arun Mannodi-Kanakkithodi

TL;DR

The paper tackles the challenge of predicting perovskite synthesizability by marrying high-throughput DFT data with literature-derived synthesis labels using Positive-Unlabeled (PU) learning. It builds a 76-feature descriptor space augmented with DFT properties to train a synthesis classifier, selecting a Decision Tree model that achieves a ROC-AUC around 0.91 and a true positive rate near 0.86 on held-out data. By applying the model to 909 DFT compounds and a generated 20,000-compound enumeration, the authors identify hundreds of potentially stable, synthesizable perovskites with promising optoelectronic properties, and they make all data and tooling accessible for reproducibility. The approach bridges computation and experiment, enabling scalable, literature-informed screening for accelerated perovskite discovery and guiding experimental validation.

Abstract

Among emerging energy materials, halide and chalcogenide perovskites have garnered significant attention over the last decade owing to the abundance of their constituent species, low manufacturing costs, and their highly tunable composition-structure-property space. Navigating the vast perovskite compositional landscape is possible using density functional theory (DFT) computations, but they are not easily extended to predictions of the synthesizability of new materials and their properties. As a result, only a limited number of compositions identified to have desirable optoelectronic properties from these calculations have been realized experimentally. One way to bridge this gap is by learning from the experimental literature about how the perovskite composition-structure space relates to their likelihood of laboratory synthesis. Here, we present our efforts in combining high-throughput DFT data with experimental labels collected from the literature to train classifier models employing various materials descriptors to forecast the synthesizability of any given perovskite compound. Our framework utilizes the positive and unlabeled (PU) learning strategy and makes probabilistic estimates of the synthesis likelihood based on DFT- computed energies and the prior existence of similar synthesized compounds. Our data and models can be readily accessed via a Findable, Accessible, Interoperable, and Reproducible (FAIR) nanoHUB tool.

Bridging the Synthesizability Gap in Perovskites by Combining Computations, Literature Data, and PU Learning

TL;DR

The paper tackles the challenge of predicting perovskite synthesizability by marrying high-throughput DFT data with literature-derived synthesis labels using Positive-Unlabeled (PU) learning. It builds a 76-feature descriptor space augmented with DFT properties to train a synthesis classifier, selecting a Decision Tree model that achieves a ROC-AUC around 0.91 and a true positive rate near 0.86 on held-out data. By applying the model to 909 DFT compounds and a generated 20,000-compound enumeration, the authors identify hundreds of potentially stable, synthesizable perovskites with promising optoelectronic properties, and they make all data and tooling accessible for reproducibility. The approach bridges computation and experiment, enabling scalable, literature-informed screening for accelerated perovskite discovery and guiding experimental validation.

Abstract

Among emerging energy materials, halide and chalcogenide perovskites have garnered significant attention over the last decade owing to the abundance of their constituent species, low manufacturing costs, and their highly tunable composition-structure-property space. Navigating the vast perovskite compositional landscape is possible using density functional theory (DFT) computations, but they are not easily extended to predictions of the synthesizability of new materials and their properties. As a result, only a limited number of compositions identified to have desirable optoelectronic properties from these calculations have been realized experimentally. One way to bridge this gap is by learning from the experimental literature about how the perovskite composition-structure space relates to their likelihood of laboratory synthesis. Here, we present our efforts in combining high-throughput DFT data with experimental labels collected from the literature to train classifier models employing various materials descriptors to forecast the synthesizability of any given perovskite compound. Our framework utilizes the positive and unlabeled (PU) learning strategy and makes probabilistic estimates of the synthesis likelihood based on DFT- computed energies and the prior existence of similar synthesized compounds. Our data and models can be readily accessed via a Findable, Accessible, Interoperable, and Reproducible (FAIR) nanoHUB tool.

Paper Structure

This paper contains 14 sections, 2 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: The overall workflow used for designing novel perovskites with high synthesis likelihood and desired DFT-accuracy properties.
  • Figure 2: (a) The compositional space considered across the four types of perovskite compounds in our dataset. (b) Visualization of the DFT dataset as a decomposition energy vs band gap plot, with different symbols and colors respectively showing different perovskite types and phases. Here, HaP = ABX$_3$ halide perovskite, ChP = ABX$_3$ chalcogenide perovskite, VO = A$_2$BX$_6$ vacancy-ordered double perovskite, and DP = A$_2$BB'X$_6$ double perovskite.
  • Figure 3: The workflow used to extract valid perovskite compounds from the MP mp database using the Materials Project API
  • Figure 4: Transductive Bagging for PU Learning. Each subset selection uses a K-fold split performed R times. For each K-fold in R, the model is trained for T iterations. The figure has been adapted from cgcnn.
  • Figure 5: Regression models based on regularized greedy forests trained for the decomposition energy and band gap. These models can be used to predict the properties of any new perovskite based on its compositional and elemental descriptors.
  • ...and 5 more figures