Table of Contents
Fetching ...

Predicting and Accelerating Nanomaterials Synthesis Using Machine Learning Featurization

Christopher C. Price, Yansong Li, Guanyu Zhou, Rehan Younas, Spencer S. Zeng, Tim H. Scanlon, Jason M. Munro, Christopher L. Hinkle

TL;DR

This work automates and generalizes feature extraction of reflection high-energy electron diffraction (RHEED) data with machine learning to establish quantitatively predictive relationships in small sets of expert-labeled data, saving significant time on subsequently grown samples.

Abstract

Materials synthesis optimization is constrained by serial feedback processes that rely on manual tools and intuition across multiple siloed modes of characterization. We automate and generalize feature extraction of reflection high-energy electron diffraction (RHEED) data with machine learning to establish quantitatively predictive relationships in small sets (\~10) of expert-labeled data, saving significant time on subsequently grown samples. These predictive relationships are evaluated in a representative material system (\ce{W_{1-x}V_xSe2} on c-plane sapphire (0001)) with two aims: 1) predicting grain alignment of the deposited film using pre-growth substrate data, and 2) estimating vanadium dopant concentration using in-situ RHEED as a proxy for ex-situ methods (e.g. x-ray photoelectron spectroscopy). Both tasks are accomplished using the same materials-agnostic features, avoiding specific system retraining and leading to a potential 80\% time saving over a 100-sample synthesis campaign. These predictions provide guidance to avoid doomed trials, reduce follow-on characterization, and improve control resolution for materials synthesis.

Predicting and Accelerating Nanomaterials Synthesis Using Machine Learning Featurization

TL;DR

This work automates and generalizes feature extraction of reflection high-energy electron diffraction (RHEED) data with machine learning to establish quantitatively predictive relationships in small sets of expert-labeled data, saving significant time on subsequently grown samples.

Abstract

Materials synthesis optimization is constrained by serial feedback processes that rely on manual tools and intuition across multiple siloed modes of characterization. We automate and generalize feature extraction of reflection high-energy electron diffraction (RHEED) data with machine learning to establish quantitatively predictive relationships in small sets (\~10) of expert-labeled data, saving significant time on subsequently grown samples. These predictive relationships are evaluated in a representative material system (\ce{W_{1-x}V_xSe2} on c-plane sapphire (0001)) with two aims: 1) predicting grain alignment of the deposited film using pre-growth substrate data, and 2) estimating vanadium dopant concentration using in-situ RHEED as a proxy for ex-situ methods (e.g. x-ray photoelectron spectroscopy). Both tasks are accomplished using the same materials-agnostic features, avoiding specific system retraining and leading to a potential 80\% time saving over a 100-sample synthesis campaign. These predictions provide guidance to avoid doomed trials, reduce follow-on characterization, and improve control resolution for materials synthesis.
Paper Structure (11 sections, 4 figures)

This paper contains 11 sections, 4 figures.

Figures (4)

  • Figure 1: (a) Summary of experimental flows for sample preparation, film growth, and characterization. At the beginning and end of MBE deposition, in-situ RHEED is collected and automatically fingerprinted. After synthesis, the sample is transferred for XPS characterization. (b) Summary of data analysis flows for synthesis and characterization data. Labeled trials are iteratively updated in the database, and correlation fitting is performed for the two tasks against the input labels. Next-trial predictions are generated within 10 seconds. (c) An image of a RHEED pattern of the as-grown film, and (d) the color mask representing featurized regions. Comprehensive metrics are extracted for each diffraction feature to form a complete fingerprint unbiased by user priors. Fingerprints are input to the empirical correlation models; see supporting information (SI) section 1.
  • Figure 2: (a) Segmented RHEED patterns for examples of aligned (top) and textured (bottom) WSe2 film growth. Labels in the bottom left correspond to sample number. (b) Confusion matrix and classification accuracy for a logistic regression model fit with bootstrap aggregation to a set of 14 samples of featurized WSe2 patterns. (c) Probability of aligned growth predictions by sample (scatters) and frequency of misclassification (bars) for the WSe2 RHEED data. (d) Segmented RHEED patterns for examples of sapphire substrates that led to aligned (top) and textured (bottom) film growths. (e) Confusion matrix and classification accuracy for the same model structure in (b), fit to the substrate RHEED instead of the film RHEED against the film labels. (f) Same as (c) for the sapphire substrate pattern classification task.
  • Figure 3: (a) Plot of predicted vs. actual vanadium doping composition W_1-xV_xSe2 assessed by XPS measurement (x-axis) and predicted from RHEED features (y-axis). Orange points show the predictions from a model fit to all 9 data points indicative of overall correlation; blue points show the composition prediction for each data point from a model generated with that point withheld. Predictions are the average of models independently fit to 0$\degree$ and 30$\degree$ data series. The black line is a visual guide to indicate zero absolute error between the XPS-derived composition and the RHEED-predicted composition result. Error bars give the standard deviation of predictions for the individual estimators within the bagging ensemble; MAE is mean absolute error of $x$. (b) Monotonic improvement in prediction accuracy for composition with added training samples, indicating avoidance of overfitting and tunability of desired prediction precision (c) Predictionsdseparately generated for the two independent RHEED series collected on the same samples at two different azimuthal angles separated by 30$\degree$ (dots and x's). Averaging the prediction at each labeled composition gives the orange points in (a).
  • Figure : For Table of Contents Only