Table of Contents
Fetching ...

Materials-Discovery Workflows Guided by Symbolic Regression: Identifying Acid-Stable Oxides for Electrocatalysis

Akhil S. Nair, Lucas Foppa, Matthias Scheffler

Abstract

The efficiency of active learning (AL) approaches to identify materials with desired properties relies on the knowledge of a few parameters describing the property. However, these parameters are unknown if the property is governed by a high intricacy of many atomistic processes. Here, we develop an AL workflow based on the sure-independence screening and sparsifying operator (SISSO) symbolic-regression approach. SISSO identifies the few, key parameters correlated with a given materials property via analytical expressions, out of many offered primary features. Crucially, we train ensembles of SISSO models in order to quantify mean predictions and their uncertainty, enabling the use of SISSO in AL. By combining bootstrap sampling to obtain training datasets with Monte-Carlo feature dropout, the high prediction errors observed by a single SISSO model are improved. Besides, the feature dropout procedure alleviates the overconfidence issues observed in the widely used bagging approach. We demonstrate the SISSO-guided AL workflow by identifying acid-stable oxides for water splitting using high-quality DFT-HSE06 calculations. From a pool of 1470 materials, 12 acid-stable materials are identified in only 30 AL iterations. The materials property maps provided by SISSO along with the uncertainty estimates reduce the risk of missing promising portions of the materials space that were overlooked in the initial, possibly biased dataset.

Materials-Discovery Workflows Guided by Symbolic Regression: Identifying Acid-Stable Oxides for Electrocatalysis

Abstract

The efficiency of active learning (AL) approaches to identify materials with desired properties relies on the knowledge of a few parameters describing the property. However, these parameters are unknown if the property is governed by a high intricacy of many atomistic processes. Here, we develop an AL workflow based on the sure-independence screening and sparsifying operator (SISSO) symbolic-regression approach. SISSO identifies the few, key parameters correlated with a given materials property via analytical expressions, out of many offered primary features. Crucially, we train ensembles of SISSO models in order to quantify mean predictions and their uncertainty, enabling the use of SISSO in AL. By combining bootstrap sampling to obtain training datasets with Monte-Carlo feature dropout, the high prediction errors observed by a single SISSO model are improved. Besides, the feature dropout procedure alleviates the overconfidence issues observed in the widely used bagging approach. We demonstrate the SISSO-guided AL workflow by identifying acid-stable oxides for water splitting using high-quality DFT-HSE06 calculations. From a pool of 1470 materials, 12 acid-stable materials are identified in only 30 AL iterations. The materials property maps provided by SISSO along with the uncertainty estimates reduce the risk of missing promising portions of the materials space that were overlooked in the initial, possibly biased dataset.

Paper Structure

This paper contains 8 sections, 2 equations, 3 figures, 1 table.

Table of Contents

  1. Results
  2. Methods

Figures (3)

  • Figure 1: (a) Schematic representation of different ensemble methods. S$_1$, ..., S$_k$ represent the subsets of the original dataset S obtained through bootstrapping. $P^{i}_{SISSO}$ is the SISSO model trained on the $i$th bootstrap sample, and $\phi^{i}_{_n}$ represents the set of primary features retained for that sample (b) comparison of absolute prediction errors (top panel) and miscalibration scores (bottom panel) across different ensemble methods. The violin plots are constructed with errors and miscalibration scores obtained by 30 independent trials with $k=10$. Some of the high error predictions are indicated with star markers.
  • Figure 2: $\Delta G_{pbx}^{\mathrm{OER}}$ change (top panel) and number of acid-stable oxides identified (bottom panel) across 30 AL iterations with the probability of feasibility (POF) and random selection (RS) acquisition strategies The filled and open square marks indicate the DFT-HSE06 calculated $\Delta G_{pbx}^{\mathrm{OER}}$ and mean prediction of ensembles of SISSO models for the oxide selected at each iteration ($\Delta G_{pbx,\mathbb{E}_{SISSO}}^{OER}$), respectively. The error bars represent the corresponding uncertainty estimates ($\sigma_{\mathbb{E}_{SISSO}}$). The stability threshold of $\Delta G_{pbx}^{OER}=$ 0.1 eV/atom is indicated with the black dashed line. The formulae and structures of acid-stable oxides identified in iterations 8,14,20 are shown in the inset of the figure.
  • Figure 3: SISSO-descriptor-based material maps of oxide stability with materials in the initial training data (black), the candidate space of 1470 oxides (white) and materials selected during AL campaigns with probability of feasibility (red) and random selection (grey). The $x$ and $y$ axes represent the descriptors obtained from the SISSO model. The filled (hollow) circles indicate materials suggested by AL which are acid-stable (unstable) from DFT-HSE06 calculations. The $y$-axis range is limited for an enlarged view of the region of interest.