A Transformer-based Model for Rapid Microstructure Inference from Four-Dimensional Scanning Transmission Electron Microscopy Data

Kwanghwi Je; Ellis R. Kennedy; Sungin Kim; Yao Yang; Erik H. Thiede

A Transformer-based Model for Rapid Microstructure Inference from Four-Dimensional Scanning Transmission Electron Microscopy Data

Kwanghwi Je, Ellis R. Kennedy, Sungin Kim, Yao Yang, Erik H. Thiede

Abstract

Properties of crystalline materials are closely linked to microstructure arising from the spatial arrangement, orientation, and phase of nanocrystals. Rapid characterization of crystalline microstructure can accelerate the identification of these links and the development of materials with desired properties. Here, we combine a machine learning framework with four-dimensional scanning transmission electron microscopy (4D-STEM) to enable fast inference of crystalline microstructure over large fields of view. The framework employs a transformer-based architecture to predict crystallographic orientations and phases from 4D-STEM diffraction patterns, yielding spatially resolved maps of microstructural features at the nanoscale. With this framework, crystallographic orientations are inferred up to two orders of magnitude faster than widely used correlative template-matching approaches. This capability enables high-throughput characterization of complex crystalline materials and facilitates the establishment of structure-property relationships central to materials design and optimization.

A Transformer-based Model for Rapid Microstructure Inference from Four-Dimensional Scanning Transmission Electron Microscopy Data

Abstract

Paper Structure (27 sections, 7 equations, 6 figures)

This paper contains 27 sections, 7 equations, 6 figures.

Introduction
Results
Discussion
Methods
Data availability
Code availability
Acknowledgements
Author information
Ethics declarations

Figures (6)

Figure 1: Schematic illustration of model and prediction workflow. a Each Bragg disk in the diffraction pattern is treated as a token whose radial distance $k_{r}$, polar angle $k_{\theta}$, and intensity $I$ are embedded and summed to yield a token representation. A transformer encoder processes the set of tokens to produce contextualized embeddings, which are then combined into a latent vector using mean pooling. A multilayer perceptron (MLP) head maps the latent vector to target structural attributes. b Symmetry-aware geodesic loss $L_{\mathrm{geo}}$ used to train the model for orientation prediction. The MLP head maps the latent vector to a rotation matrix $R_{\mathrm{pred}}$ in $\mathrm{SO}(3)$, representing the crystal orientation. Each diffraction pattern in the training set is simulated from an orientation label $R_{\mathrm{label}}$, chosen as a representative of its symmetry-equivalent orientation class. During training, all symmetry-equivalent variants of the label are generated using proper point group operators of crystals. The predicted orientation is compared with these variants using the geodesic distance on $\mathrm{SO}(3)$, which quantifies the angular misorientation between rotations. The minimum distance defines the loss.
Figure 2: a Sampling of orientation labels for generating the training and validation sets. These labels are sampled in two steps. A grid of symmetrically unique zone-axis directions is sampled for the selected face-centered-cubic (fcc) copper ($Cu$) crystal (left). These directions lie on the surface of the unit sphere bounded by the [001], [011], and [111] directions. For each zone axis, the crystal lattice is rotated to align the zone axis with the incident electron beam. An additional in-plane rotation about the zone axis is then applied to sample the full range of crystal orientations (right). b Sampling of orientation labels for generating the test set. Zone-axis directions are randomly sampled on the surface of the unit sphere (left), yielding an approximately uniform distribution in $\cos(\phi_{\mathrm{polar}})$ and $\theta_{\mathrm{azimuthal}}$ (top right). $\phi_{\mathrm{polar}}$ and $\theta_{\mathrm{azimuthal}}$ denote the polar and azimuthal angles in spherical coordinates. Uniformly sampled in-plane rotations (bottom right) are then applied about each zone axis. c Evolution of geodesic loss $L_{\mathrm{geo}}$ for the training and validation sets during training. d Estimated densities of $L_{\mathrm{geo}}$ for the validation set (top) and test set (bottom), shown for the trained model (brown) and an untrained model (blue). e Diffraction patterns simulated from a reference orientation label and from the corresponding predicted orientation for a data point marked in (d). The two patterns exhibit high visual similarity despite a large $L_{\mathrm{geo}}$ value. f Distribution of angular misalignment between symmetry-reduced crystal axes for the orientation label and predicted orientation, $\theta_{x}$, $\theta_{y}$, and $\theta_{z}$. g Estimated density of the cosine of the angular misalignment in (f).
Figure 3: a Four-dimensional scanning transmission electron microscopy (4D-STEM) datasets with different scan grid sizes used in this study to benchmark computation times for orientation mapping. b Orientation mapping of 4D-STEM data using correlative template matching implemented in py4DSTEM. c Orientation mapping of 4D-STEM data using our model, which involves four sequential steps: loading pre-processed 4D-STEM data (Bragg disk feature map across a four-dimensional grid of scan positions), transforming Bragg disk positional coordinates and constructing the model input, performing model inference, and assembling the final orientation map. d For each 4D-STEM dataset and method, we measured the elapsed time for orientation mapping using 10 independent runs and report the corresponding mean and standard deviation. e Wall-clock execution times for the four processes in (c) when using the GPU. Elapsed time for each step was measured independently, and statistics were obtained over 10 repeated runs as in (d).
Figure 4: a A virtual 4D-STEM annular dark-field image of fcc $Cu$ dendritic crystals grown in liquid under an electric field. b,c Orientation maps of 4D-STEM data in (a) obtained from py4DSTEM template matching (b) and from our model (c). Left and right panels are maps of in-plane orientation and out-of-plane orientation, corresponding to symmetry-reduced x- and z-axis directions, respectively. The orientation color code is shown in the right panel of (c). Black regions indicate scan positions for which no orientation is assigned. d--l Comparison of predictions from py4DSTEM template matching and from the model for selected diffraction patterns marked by yellow circles in (a). For each example, the experimental diffraction pattern (left), the simulated pattern from py4DSTEM prediction (middle), and the simulated pattern from the model prediction (right) are shown. To illustrate the correspondence between simulated and experimental Bragg disks, the experimental Bragg disks (blue) are overlaid on the simulated diffraction patterns. The selected diffraction patterns in (a) are randomly drawn from those containing more than six Bragg disks.
Figure 5: a Estimated density of the sparse correlation score (top) and the cross-correlation score $Q$ (bottom) from model predictions (red) and py4DSTEM template matching (blue). b Selected examples where transformer predictions underperform template matching. Experimental diffraction patterns (left), simulated patterns from py4DSTEM template matching (middle), and simulated patterns from the model predictions (right). The examples are selected from cases in which model predictions yield lower sparse correlation scores than those obtained from template matching. c Examples illustrating why model predictions yield lower sparse correlation scores than template matching despite a larger number of matched Bragg disks. Bragg disks derived from model predictions show less precise positional agreement with the experimental disks, whereas those derived from template matching exhibit closer positional or intensity alignment for a subset of the experimental disks. Sparse correlation scores are shown in the top left of each panel. d Highly correlated domain identified from the model predictions. The domain corresponds to scan positions whose predicted zone axes show strong angular alignment with those of their neighboring scan positions. e Two-point correlation of predicted zone axes for positions in the domain in (d). The panels display the autocorrelation of the model predictions (left), the autocorrelation of the py4DSTEM predictions (middle), and the cross-correlation between the model and py4DSTEM predictions (right). f Azimuthal average of the two-dimensional correlations in (e). As a reference, we also show the autocorrelation of py4DSTEM zone-axis predictions over the entire scan region where orientations are assigned.
...and 1 more figures

A Transformer-based Model for Rapid Microstructure Inference from Four-Dimensional Scanning Transmission Electron Microscopy Data

Abstract

A Transformer-based Model for Rapid Microstructure Inference from Four-Dimensional Scanning Transmission Electron Microscopy Data

Authors

Abstract

Table of Contents

Figures (6)