Preliminary Investigations of a Multi-Faceted Robust and Synergistic Approach in Semiconductor Electron Micrograph Analysis: Integrating Vision Transformers with Large Language and Multimodal Models
Sakhinana Sagar Srinivas, Geethan Sannidhi, Sreeja Gangasani, Chidaksh Ravuru, Venkataramana Runkana
TL;DR
Automated nanomaterial identification from SEM micrographs is challenging due to intra-class variability and inter-class similarity. The authors propose CM-EMRL, a cross-modal pipeline that fuses a ViT-based image encoder with domain knowledge from large language models via zero-shot CoT prompting and few-shot prompts from large multimodal models, all integrated through a unified attention layer. The approach yields state-of-the-art results on SEM datasets and generalizes to additional benchmarks, with ablations confirming the value of each component (LLM prompts, LMM prompts, and cross-modal fusion). This work offers a scalable, interpretable framework that blends linguistic and visual signals to enable robust high-throughput nanomaterial screening for semiconductor manufacturing.
Abstract
Characterizing materials using electron micrographs is crucial in areas such as semiconductors and quantum materials. Traditional classification methods falter due to the intricatestructures of these micrographs. This study introduces an innovative architecture that leverages the generative capabilities of zero-shot prompting in Large Language Models (LLMs) such as GPT-4(language only), the predictive ability of few-shot (in-context) learning in Large Multimodal Models (LMMs) such as GPT-4(V)ision, and fuses knowledge across image based and linguistic insights for accurate nanomaterial category prediction. This comprehensive approach aims to provide a robust solution for the automated nanomaterial identification task in semiconductor manufacturing, blending performance, efficiency, and interpretability. Our method surpasses conventional approaches, offering precise nanomaterial identification and facilitating high-throughput screening.
