Table of Contents
Fetching ...

Zero-shot Autonomous Microscopy for Scalable and Intelligent Characterization of 2D Materials

Jingyun Yang, Ruoyan Avery Yin, Chi Jiang, Yuepeng Hu, Xiaokai Zhu, Xingjian Hu, Sutharsika Kumar, Xiao Wang, Xiaohua Zhai, Keran Rong, Yunyue Zhu, Tianyi Zhang, Zongyou Yin, Jing Kong, Neil Zhenqiang Gong, Zhichu Ren, Haozhe Wang

Abstract

Characterization of atomic-scale materials traditionally requires human experts with months to years of specialized training. Even for trained human operators, accurate and reliable characterization remains challenging when examining newly discovered materials such as two-dimensional (2D) structures. This bottleneck drives demand for fully autonomous experimentation systems capable of comprehending research objectives without requiring large training datasets. In this work, we present ATOMIC (Autonomous Technology for Optical Microscopy & Intelligent Characterization), an end-to-end framework that integrates foundation models to enable fully autonomous, zero-shot characterization of 2D materials. Our system integrates the vision foundation model (i.e., Segment Anything Model), large language models (i.e., ChatGPT), unsupervised clustering, and topological analysis to automate microscope control, sample scanning, image segmentation, and intelligent analysis through prompt engineering, eliminating the need for additional training. When analyzing typical MoS2 samples, our approach achieves 99.7% segmentation accuracy for single layer identification, which is equivalent to that of human experts. In addition, the integrated model is able to detect grain boundary slits that are challenging to identify with human eyes. Furthermore, the system retains robust accuracy despite variable conditions including defocus, color temperature fluctuations, and exposure variations. It is applicable to a broad spectrum of common 2D materials-including graphene, MoS2, WSe2, SnSe-regardless of whether they were fabricated via chemical vapor deposition or mechanical exfoliation. This work represents the implementation of foundation models to achieve autonomous analysis, establishing a scalable and data-efficient characterization paradigm that fundamentally transforms the approach to nanoscale materials research.

Zero-shot Autonomous Microscopy for Scalable and Intelligent Characterization of 2D Materials

Abstract

Characterization of atomic-scale materials traditionally requires human experts with months to years of specialized training. Even for trained human operators, accurate and reliable characterization remains challenging when examining newly discovered materials such as two-dimensional (2D) structures. This bottleneck drives demand for fully autonomous experimentation systems capable of comprehending research objectives without requiring large training datasets. In this work, we present ATOMIC (Autonomous Technology for Optical Microscopy & Intelligent Characterization), an end-to-end framework that integrates foundation models to enable fully autonomous, zero-shot characterization of 2D materials. Our system integrates the vision foundation model (i.e., Segment Anything Model), large language models (i.e., ChatGPT), unsupervised clustering, and topological analysis to automate microscope control, sample scanning, image segmentation, and intelligent analysis through prompt engineering, eliminating the need for additional training. When analyzing typical MoS2 samples, our approach achieves 99.7% segmentation accuracy for single layer identification, which is equivalent to that of human experts. In addition, the integrated model is able to detect grain boundary slits that are challenging to identify with human eyes. Furthermore, the system retains robust accuracy despite variable conditions including defocus, color temperature fluctuations, and exposure variations. It is applicable to a broad spectrum of common 2D materials-including graphene, MoS2, WSe2, SnSe-regardless of whether they were fabricated via chemical vapor deposition or mechanical exfoliation. This work represents the implementation of foundation models to achieve autonomous analysis, establishing a scalable and data-efficient characterization paradigm that fundamentally transforms the approach to nanoscale materials research.

Paper Structure

This paper contains 14 sections, 5 figures.

Figures (5)

  • Figure 1: ATOMIC framework to autonomously analyze 2D materials.a, Schematic workflow of the ATOMIC framework, illustrating the synergistic integration of foundation models to enable autonomous microscopy. The system combines GPT for hardware control, Segment Anything Model (SAM) to generate segmentation masks, and GPT-supervised clustering to identify materials species. This multi-model approach enables microscope control, decision-making, and autonomous analysis to achieve self-directed microscopic imaging and characterization without human intervention. b, Optical microscopic image of MoS$_2$ crystals synthesized via chemical vapor deposition, captured using our autonomous microscopy system. The image exhibits characteristic monolayer, bilayer, and multilayer regions with distinct optical contrast. c, Segmentation masks generated by the Segment Anything Model (SAM) highlighting potential material regions, overlaid on the original imgage. d, Material classification results obtained through GPT-supervised clustering, distinguishing between substrate, monolayer, bilayer, multilayer regions, and impurities. e, Size distribution histogram of triangular monolayer MoS$_2$ flakes, showing side length measurements. The dashed line represents the kernel density estimate (KDE) of the distribution. f, Pixel-level confusion matrix for the automated identification of CVD-grown MoS$_2$ crystal regions. Labels: "monolyr" = Monolayer, "multilyr" = Multilayer, "substrate" = Substrate, "other" = Impurities or unrelated regions.
  • Figure 2: Spatial segmentation enabled by Segment Anything Model and Topological Correction.a, Original optical microscope image showing MoS$_2$ flakes on a substrate. b-c, Human annotated labels and Canny edge detection results. d, SAM Segmentation results with clear, continuous and detailed boundaries. e, Original optical micrograph showing MoS$_2$ flakes on substrate, f, Human annotation identifying a single continuous flake, and g, SAM-generated segmentation revealing four distinct flakes separated by grain boundary slits. Image magnified 2.5× from original acquisition at 100× magnification. h, Image comparison revealing a grain boundary slit that is directly detected by SAM at low magnification but only visually observable by human after image enhancement (zoom-in, increase contrast, and transfer to gray scale). i, Segmented masks before (top) and after (bottom) topological correction, with green line highlighting the corrected multilayer region boundary. Topological correction effectively compensate for SAM's limit in spatial capability. Scale bar=10$\mu$m. j, Mean RGB value with standard deviation of segmentation masks before (left) and after (middle and right) topological correction. Significant shift in mean RGB value and reduction in standard deviation shows enhanced precision in segmentation.
  • Figure 3: GPT-supervised k-means clustering.a, Three-dimensional visualization of mean RGB values for each segmented region plotted in RGB color space, with orthogonal projections displayed on the RG, GB, and RB planes to understand the contribution of each RGB channel to materials classification. b, Distortion analysis of a representative microscopic image showing the relationship between distortion and cluster number $k$. The optimal $k$-values identified by the conventional Elbow method (k=3, green) and GPT-assisted analysis (k=4, red) are indicated. c, Comparison of optimal cluster number $k$ selection across multiple sample images using three methods: Elbow method (orange, top figure), GPT-o1-mini (green, middle figure), and GPT-4o (blue, bottom figure), with human-selected values (solid black line in each figure) as baseline. The x-axis represents different image indices, while the y-axis shows selected $k$ values. Shaded regions indicate twice the margin of error from human-selected values for visualization purposes. Error bars represent standard deviations across 15 repeated queries for each image using GPT models, with modal values plotted as data points. GPT-4o demonstrates robust alignment with human expert decision in $k$ selection, exhibiting highly consistency and stability across multiple test samples. d, Original micrograph showing monolayers, impurities, and substrate, with a dashed frame highlighting a multilayer region identifiable by human experts. e, Material classification results using Elbow method-selected $k$ value, showing failure to detect the multilayer region marked in black frame. f, Classification results using GPT-selected k value, successfully detecting the multilayer region within the framed area, demonstrating superior alignment with human expert identification. g, Spatial difference map highlighting classification discrepancies between Elbow method and GPT-assisted clustering approaches. Inset panels show magnified views of the framed region in panels d-f: Original microscopic image (left); Elbow method classification results incorrectly identifying multilayer regions as monolayer (middle); GPT-supervised classification results accurately distinguishing multilayer from monolayer regions (right). Scale bar=5$\mu$m.
  • Figure 4: Robustness of the foundation model synergy for 2D materials analysis.a, Optical images of MoS$_2$ flakes acquired under varying imaging conditions: standard acquisition (top left), color desaturation (20% intensity reduction, top middle), defocus (z-axis shift, top right), white balance shift (color temperature changed from 5500K to 3200K, bottom left), underexposure (24% of standard exposure time, bottom middle), and overexposure (295% of standard exposure time, bottom right). Layer thickness at the center of the standard captured image (top left) was validated by Atomic Force Microscopy (Fig. S13). Scale bar=10$\mu$m.
  • Figure : b, Classification accuracy under each imaging condition assessed at both mask level (top panel) and pixel level (bottom panel). c, Autonomous analysis of diverse 2D materials: mechanically exfoliated monolayer MoS$_2$ (column 1), exfoliated multilayer MoS$_2$ (column 2), PVD-grown SnSe (column 3), CVD-grown WSe$_2$ (column 4), and CVD-grown graphene (column 5). Top row shows original micrographs with human annotations; middle row displays SAM segmentation with topological corrections; bottom row presents layer classification after GPT-supervised clustering. Color code: substrate (purple), mono- or thinnest layer (yellow), bilayer (green), multilayer (pink). Scale bar=10$\mu$m. d-e, Comparison of ATOMIC system performance with machine learning (ML)-based approaches. d, Mask-level accuracy comparison between our zero-shot ATOMIC system and previous ML methods, plotted versus training dataset size (x-axis). Our approach achieves 97.6% accuracy without requiring training data. Detailed analysis provided in Supplementary S2. e, Pixel-level accuracy comparison with previously reported ML methods, demonstrating ATOMIC's superior performance (99.7%). All values are adopted directly from the literature, with model type indicated for each data point masubuchi2020deepmasubuchi2019classifyingsaito2019deephan2020deepgreplova2020fullyzhu2022artificialuslu2024opendong20213dleger2024machine. The stars represent our work.