Table of Contents
Fetching ...

A Large-Language-Model Assisted Automated Scale Bar Detection and Extraction Framework for Scanning Electron Microscopic Images

Yuxuan Chen, Ruotong Yang, Zhengyang Zhang, Mehreen Ahmed, Yanming Wang

TL;DR

The paper tackles the manual bottleneck of scale-bar detection in SEM imagery by introducing a four-phase, multi-modal framework that combines Auto-DG synthetic data generation, YOLO-based scale-bar detection, a hybrid OCR for text extraction, and an LLM-based verifier for result validation. Auto-DG synthesizes diverse training data by inserting four scale-bar shapes and text into SEM backgrounds from public datasets, enabling robust model training. Detection uses YOLOv5, OCR employs a CnOCR-PaddleOCR hybrid with unit normalization and an association step based on distance to assign text to scale bars, and an LLM (domain-adapted LLaMA3) reasons over outputs to refine results. Experimental results show high precision and recall for detection ($IOU$-based metrics) and robust OCR performance, with the integrated LLM agent further improving reliability, demonstrating a scalable path to automated, accurate SEM metrology.

Abstract

Microscopic characterizations, such as Scanning Electron Microscopy (SEM), are widely used in scientific research for visualizing and analyzing microstructures. Determining the scale bars is an important first step of accurate SEM analysis; however, currently, it mainly relies on manual operations, which is both time-consuming and prone to errors. To address this issue, we propose a multi-modal and automated scale bar detection and extraction framework that provides concurrent object detection, text detection and text recognition with a Large Language Model (LLM) agent. The proposed framework operates in four phases; i) Automatic Dataset Generation (Auto-DG) model to synthesize a diverse dataset of SEM images ensuring robust training and high generalizability of the model, ii) scale bar object detection, iii) information extraction using a hybrid Optical Character Recognition (OCR) system with DenseNet and Convolutional Recurrent Neural Network (CRNN) based algorithms, iv) an LLM agent to analyze and verify accuracy of the results. The proposed model demonstrates a strong performance in object detection and accurate localization with a precision of 100%, recall of 95.8%, and a mean Average Precision (mAP) of 99.2% at IoU=0.5 and 69.1% at IoU=0.5:0.95. The hybrid OCR system achieved 89% precision, 65% recall, and a 75% F1 score on the Auto-DG dataset, significantly outperforming several mainstream standalone engines, highlighting its reliability for scientific image analysis. The LLM is introduced as a reasoning engine as well as an intelligent assistant that suggests follow-up steps and verifies the results. This automated method powered by an LLM agent significantly enhances the efficiency and accuracy of scale bar detection and extraction in SEM images, providing a valuable tool for microscopic analysis and advancing the field of scientific imaging.

A Large-Language-Model Assisted Automated Scale Bar Detection and Extraction Framework for Scanning Electron Microscopic Images

TL;DR

The paper tackles the manual bottleneck of scale-bar detection in SEM imagery by introducing a four-phase, multi-modal framework that combines Auto-DG synthetic data generation, YOLO-based scale-bar detection, a hybrid OCR for text extraction, and an LLM-based verifier for result validation. Auto-DG synthesizes diverse training data by inserting four scale-bar shapes and text into SEM backgrounds from public datasets, enabling robust model training. Detection uses YOLOv5, OCR employs a CnOCR-PaddleOCR hybrid with unit normalization and an association step based on distance to assign text to scale bars, and an LLM (domain-adapted LLaMA3) reasons over outputs to refine results. Experimental results show high precision and recall for detection (-based metrics) and robust OCR performance, with the integrated LLM agent further improving reliability, demonstrating a scalable path to automated, accurate SEM metrology.

Abstract

Microscopic characterizations, such as Scanning Electron Microscopy (SEM), are widely used in scientific research for visualizing and analyzing microstructures. Determining the scale bars is an important first step of accurate SEM analysis; however, currently, it mainly relies on manual operations, which is both time-consuming and prone to errors. To address this issue, we propose a multi-modal and automated scale bar detection and extraction framework that provides concurrent object detection, text detection and text recognition with a Large Language Model (LLM) agent. The proposed framework operates in four phases; i) Automatic Dataset Generation (Auto-DG) model to synthesize a diverse dataset of SEM images ensuring robust training and high generalizability of the model, ii) scale bar object detection, iii) information extraction using a hybrid Optical Character Recognition (OCR) system with DenseNet and Convolutional Recurrent Neural Network (CRNN) based algorithms, iv) an LLM agent to analyze and verify accuracy of the results. The proposed model demonstrates a strong performance in object detection and accurate localization with a precision of 100%, recall of 95.8%, and a mean Average Precision (mAP) of 99.2% at IoU=0.5 and 69.1% at IoU=0.5:0.95. The hybrid OCR system achieved 89% precision, 65% recall, and a 75% F1 score on the Auto-DG dataset, significantly outperforming several mainstream standalone engines, highlighting its reliability for scientific image analysis. The LLM is introduced as a reasoning engine as well as an intelligent assistant that suggests follow-up steps and verifies the results. This automated method powered by an LLM agent significantly enhances the efficiency and accuracy of scale bar detection and extraction in SEM images, providing a valuable tool for microscopic analysis and advancing the field of scientific imaging.

Paper Structure

This paper contains 8 sections, 1 equation, 6 figures.

Figures (6)

  • Figure 1: High level architecture outlining the multi-stage methodology, i.e., Phase i: Dataset Generation and Training, Phase ii: Scale Bar Object Detection, Phase iii: Information Extraction and lastly, Phase iv: LLM Verification and Feedback, for our proposed system that is designed for SEM images.
  • Figure 2: (a) Flowchart outlining the data generation and training process. (b) Four common shapes of scale bars: i) Joint-label bar, ii) I-shaped bar, iii) Ruler-shaped bar and iv) Rectangular bar.
  • Figure 3: (a) Flowchart outlining the architecture of the object detection phase. (b) Performance of the proposed object detection model in detecting scale bars is demonstrated using SEM images with complex backgrounds and scale bars placed in different positions i.e.; top-right (i, iv, vi, x), bottom-left (viii, xi, xii), bottom-right (ix), center (ii, iii, v, vii). The detected scale bars are highlighted with red bounding boxes and confidence scores in parenthesis and zoomed-in regions are shown in the center.
  • Figure 4: (a) Flowchart outlining the steps involved in the information extraction phase. (b) The example outputs from the hybrid OCR system for SEM scale information extraction are shown that detected values (e.g., "10 cm", "500 mm") are displayed with confidence scores (in parenthesis). The system handles variations in nominal values (e.g., iv: 87%, vi: 73%), units (e.g., iii: 91%, v: 76%), and scale mismatches (e.g., vi: 73%, vii: 69%), demonstrating robustness across diverse conditions. (c) The performance evaluation of different OCR tools for the proposed information extraction on the SEM images.
  • Figure 5: Proposed framework performance on real-world SEM images from our laboratory, with detected scale bars highlighted by bounding boxes and confidence scores (zoomed regions shown).
  • ...and 1 more figures