Revealing the Evolution of Order in Materials Microstructures Using Multi-Modal Computer Vision
Arman Ter-Petrosyan, Michael Holden, Jenna A. Bilbrey, Sarah Akers, Christina Doty, Kayla H. Yano, Le Wang, Rajendra Paudel, Eric Lang, Khalid Hattar, Ryan B. Comes, Yingge Du, Bethany E. Matthews, Steven R. Spurgeon
TL;DR
This work tackles the challenge of quantifying microstructural order in complex oxide interfaces by adopting a multi-modal computer vision framework that fuses HAADF STEM imaging with EDS spectroscopy. It systematically compares three segmentation strategies—community detection, agglomerative clustering, and few-shot classification—across single and multi-modal data, revealing that ensemble approaches best capture irradiation-induced disorder. By extracting FFT-based crystallinity and composition descriptors within identified regions, the study links structural disorder to changes in local oxygen content and cation composition, offering objective, scalable descriptors beyond manual analysis. The findings demonstrate the potential of multi-modal descriptors to inform kinetic models and autonomous experimentation in materials under extreme conditions, while highlighting the need for ground-truth data to further quantify model accuracy and guide modality selection.
Abstract
The development of high-performance materials for microelectronics, energy storage, and extreme environments depends on our ability to describe and direct property-defining microstructural order. Our present understanding is typically derived from laborious manual analysis of imaging and spectroscopy data, which is difficult to scale, challenging to reproduce, and lacks the ability to reveal latent associations needed for mechanistic models. Here, we demonstrate a multi-modal machine learning (ML) approach to describe order from electron microscopy analysis of the complex oxide La$_{1-x}$Sr$_x$FeO$_3$. We construct a hybrid pipeline based on fully and semi-supervised classification, allowing us to evaluate both the characteristics of each data modality and the value each modality adds to the ensemble. We observe distinct differences in the performance of uni- and multi-modal models, from which we draw general lessons in describing crystal order using computer vision.
