A Large-Language-Model Assisted Automated Scale Bar Detection and Extraction Framework for Scanning Electron Microscopic Images
Yuxuan Chen, Ruotong Yang, Zhengyang Zhang, Mehreen Ahmed, Yanming Wang
TL;DR
The paper tackles the manual bottleneck of scale-bar detection in SEM imagery by introducing a four-phase, multi-modal framework that combines Auto-DG synthetic data generation, YOLO-based scale-bar detection, a hybrid OCR for text extraction, and an LLM-based verifier for result validation. Auto-DG synthesizes diverse training data by inserting four scale-bar shapes and text into SEM backgrounds from public datasets, enabling robust model training. Detection uses YOLOv5, OCR employs a CnOCR-PaddleOCR hybrid with unit normalization and an association step based on distance to assign text to scale bars, and an LLM (domain-adapted LLaMA3) reasons over outputs to refine results. Experimental results show high precision and recall for detection ($IOU$-based metrics) and robust OCR performance, with the integrated LLM agent further improving reliability, demonstrating a scalable path to automated, accurate SEM metrology.
Abstract
Microscopic characterizations, such as Scanning Electron Microscopy (SEM), are widely used in scientific research for visualizing and analyzing microstructures. Determining the scale bars is an important first step of accurate SEM analysis; however, currently, it mainly relies on manual operations, which is both time-consuming and prone to errors. To address this issue, we propose a multi-modal and automated scale bar detection and extraction framework that provides concurrent object detection, text detection and text recognition with a Large Language Model (LLM) agent. The proposed framework operates in four phases; i) Automatic Dataset Generation (Auto-DG) model to synthesize a diverse dataset of SEM images ensuring robust training and high generalizability of the model, ii) scale bar object detection, iii) information extraction using a hybrid Optical Character Recognition (OCR) system with DenseNet and Convolutional Recurrent Neural Network (CRNN) based algorithms, iv) an LLM agent to analyze and verify accuracy of the results. The proposed model demonstrates a strong performance in object detection and accurate localization with a precision of 100%, recall of 95.8%, and a mean Average Precision (mAP) of 99.2% at IoU=0.5 and 69.1% at IoU=0.5:0.95. The hybrid OCR system achieved 89% precision, 65% recall, and a 75% F1 score on the Auto-DG dataset, significantly outperforming several mainstream standalone engines, highlighting its reliability for scientific image analysis. The LLM is introduced as a reasoning engine as well as an intelligent assistant that suggests follow-up steps and verifies the results. This automated method powered by an LLM agent significantly enhances the efficiency and accuracy of scale bar detection and extraction in SEM images, providing a valuable tool for microscopic analysis and advancing the field of scientific imaging.
