Bridging the Synthesizability Gap in Perovskites by Combining Computations, Literature Data, and PU Learning

Rushik Desai; Junyeong Ahn; Alejandro Strachan; Arun Mannodi-Kanakkithodi

Bridging the Synthesizability Gap in Perovskites by Combining Computations, Literature Data, and PU Learning

Rushik Desai, Junyeong Ahn, Alejandro Strachan, Arun Mannodi-Kanakkithodi

TL;DR

The paper tackles the challenge of predicting perovskite synthesizability by marrying high-throughput DFT data with literature-derived synthesis labels using Positive-Unlabeled (PU) learning. It builds a 76-feature descriptor space augmented with DFT properties to train a synthesis classifier, selecting a Decision Tree model that achieves a ROC-AUC around 0.91 and a true positive rate near 0.86 on held-out data. By applying the model to 909 DFT compounds and a generated 20,000-compound enumeration, the authors identify hundreds of potentially stable, synthesizable perovskites with promising optoelectronic properties, and they make all data and tooling accessible for reproducibility. The approach bridges computation and experiment, enabling scalable, literature-informed screening for accelerated perovskite discovery and guiding experimental validation.

Abstract

Among emerging energy materials, halide and chalcogenide perovskites have garnered significant attention over the last decade owing to the abundance of their constituent species, low manufacturing costs, and their highly tunable composition-structure-property space. Navigating the vast perovskite compositional landscape is possible using density functional theory (DFT) computations, but they are not easily extended to predictions of the synthesizability of new materials and their properties. As a result, only a limited number of compositions identified to have desirable optoelectronic properties from these calculations have been realized experimentally. One way to bridge this gap is by learning from the experimental literature about how the perovskite composition-structure space relates to their likelihood of laboratory synthesis. Here, we present our efforts in combining high-throughput DFT data with experimental labels collected from the literature to train classifier models employing various materials descriptors to forecast the synthesizability of any given perovskite compound. Our framework utilizes the positive and unlabeled (PU) learning strategy and makes probabilistic estimates of the synthesis likelihood based on DFT- computed energies and the prior existence of similar synthesized compounds. Our data and models can be readily accessed via a Findable, Accessible, Interoperable, and Reproducible (FAIR) nanoHUB tool.

Bridging the Synthesizability Gap in Perovskites by Combining Computations, Literature Data, and PU Learning

TL;DR

Abstract

Bridging the Synthesizability Gap in Perovskites by Combining Computations, Literature Data, and PU Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)