Table of Contents
Fetching ...

Assessing data-driven predictions of band gap and electrical conductivity for transparent conducting materials

Federico Ottomano, John Y. Goulermas, Vladimir Gusev, Rahul Savani, Michael W. Gaultois, Troy D. Manning, Hai Lin, Teresa P. Manzanera, Emmeline G. Poole, Matthew S. Dyer, John B. Claridge, Jon Alaria, Luke M. Daniels, Su Varma, David Rimmer, Kevin Sanderson, Matthew J. Rosseinsky

Abstract

Machine Learning (ML) has offered innovative perspectives for accelerating the discovery of new functional materials, leveraging the increasing availability of material databases. Despite the promising advances, data-driven methods face constraints imposed by the quantity and quality of available data. Moreover, ML is often employed in tandem with simulated datasets originating from density functional theory (DFT), and assessed through in-sample evaluation schemes. This scenario raises questions about the practical utility of ML in uncovering new and significant material classes for industrial applications. Here, we propose a data-driven framework aimed at accelerating the discovery of new transparent conducting materials (TCMs), an important category of semiconductors with a wide range of applications. To mitigate the shortage of available data, we create and validate unique experimental databases, comprising several examples of existing TCMs. We assess state-of-the-art (SOTA) ML models for property prediction from the stoichiometry alone. We propose a bespoke evaluation scheme to provide empirical evidence on the ability of ML to uncover new, previously unseen materials of interest. We test our approach on a list of 55 compositions containing typical elements of known TCMs. Although our study indicates that ML tends to identify new TCMs compositionally similar to those in the training data, we empirically demonstrate that it can highlight material candidates that may have been previously overlooked, offering a systematic approach to identify materials that are likely to display TCMs characteristics.

Assessing data-driven predictions of band gap and electrical conductivity for transparent conducting materials

Abstract

Machine Learning (ML) has offered innovative perspectives for accelerating the discovery of new functional materials, leveraging the increasing availability of material databases. Despite the promising advances, data-driven methods face constraints imposed by the quantity and quality of available data. Moreover, ML is often employed in tandem with simulated datasets originating from density functional theory (DFT), and assessed through in-sample evaluation schemes. This scenario raises questions about the practical utility of ML in uncovering new and significant material classes for industrial applications. Here, we propose a data-driven framework aimed at accelerating the discovery of new transparent conducting materials (TCMs), an important category of semiconductors with a wide range of applications. To mitigate the shortage of available data, we create and validate unique experimental databases, comprising several examples of existing TCMs. We assess state-of-the-art (SOTA) ML models for property prediction from the stoichiometry alone. We propose a bespoke evaluation scheme to provide empirical evidence on the ability of ML to uncover new, previously unseen materials of interest. We test our approach on a list of 55 compositions containing typical elements of known TCMs. Although our study indicates that ML tends to identify new TCMs compositionally similar to those in the training data, we empirically demonstrate that it can highlight material candidates that may have been previously overlooked, offering a systematic approach to identify materials that are likely to display TCMs characteristics.

Paper Structure

This paper contains 32 sections, 7 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Data distributions for $\sigma$ (left) and $E_g$ (right). $\bar{x}$ and $\tilde{x}$ denote the mean and the median, respectively. The purple dotted line on $\sigma$ distribution indicates the minimum metallic conductivity $\sigma_{min}=10^3 \, \text{(S/cm)}$.
  • Figure 2: Schematic representation of the proposed evaluation to simulate the discovery of new TCMs: following an iterative scheme, a specific family of known TCMs is placed in the test set, while ML models are trained on the remaining TCMs within training data. This procedure repeats for each available TCM family.
  • Figure 3: LOCO-CV material clusters obtained separately for the conductivity dataset (left) and for the band gap dataset (right).
  • Figure 4: Parity plots are shown for both electrical conductivity (top) and band gap (bottom) prediction. These were obtained by concatenating the different validation folds used in the K-fold evaluation scheme.
  • Figure 5: Confusion matrices for the metal vs. non-metal classification task are displayed for the standard CrabNet (left), fine-tuned CrabNet (center), and RF. The fine-tuned CrabNet shows a remarkable improvement, with a significant reduction in false negatives compared to both the standard CrabNet and RF models.
  • ...and 4 more figures