Table of Contents
Fetching ...

FabGPT: An Efficient Large Multimodal Model for Complex Wafer Defect Knowledge Queries

Yuqi Jiang, Xudong Lu, Qian Jin, Qi Sun, Hanming Wu, Cheng Zhuo

TL;DR

FabGPT tackles wafer defect knowledge querying by integrating defect detection in SEM imagery with domain-specific Q&A in a domain-adaptive large multimodal framework. Its three-stage pipeline—modal enhancement, pixel-level detection, and Q&A with a modulation-driven prompt system—mitigates modality bias while embedding wafer defect knowledge through corpus training. Empirical results on the SEM-WaD dataset show state-of-the-art defect detection metrics and high Q&A accuracy, underscoring practical benefits for IC manufacturing. This approach provides a blueprint for domain-specific LMMs that balance vision-language reasoning with specialized knowledge, enabling robust defect analysis and actionable process insights.

Abstract

Intelligence is key to advancing integrated circuit (IC) fabrication. Recent breakthroughs in Large Multimodal Models (LMMs) have unlocked extraditionary abilities in understanding images and text, fostering intelligent fabrication. Leveraging the power of LMMs, we introduce FabGPT, a customized IC fabrication large multimodal model for wafer defect knowledge query. FabGPT manifests expertise in conducting defect detection in Scanning Electron Microscope (SEM) images, performing root cause analysis, and providing expert Q&A on fabrication processes. FabGPT matches enhanced multimodal features to automatically detect minute defects under complex wafer backgrounds and reduce the subjectivity of manual threshold settings. Besides, the proposed modulation module and interactive corpus training strategy embed wafer defect knowledge into the pre-trained model, effectively balancing Q&A queries related to defect knowledge and original knowledge and mitigating the modality bias issues. Experiments on in-house fab data show that FabGPT achieves significant performance improvement in wafer defect detection and knowledge querying.

FabGPT: An Efficient Large Multimodal Model for Complex Wafer Defect Knowledge Queries

TL;DR

FabGPT tackles wafer defect knowledge querying by integrating defect detection in SEM imagery with domain-specific Q&A in a domain-adaptive large multimodal framework. Its three-stage pipeline—modal enhancement, pixel-level detection, and Q&A with a modulation-driven prompt system—mitigates modality bias while embedding wafer defect knowledge through corpus training. Empirical results on the SEM-WaD dataset show state-of-the-art defect detection metrics and high Q&A accuracy, underscoring practical benefits for IC manufacturing. This approach provides a blueprint for domain-specific LMMs that balance vision-language reasoning with specialized knowledge, enabling robust defect analysis and actionable process insights.

Abstract

Intelligence is key to advancing integrated circuit (IC) fabrication. Recent breakthroughs in Large Multimodal Models (LMMs) have unlocked extraditionary abilities in understanding images and text, fostering intelligent fabrication. Leveraging the power of LMMs, we introduce FabGPT, a customized IC fabrication large multimodal model for wafer defect knowledge query. FabGPT manifests expertise in conducting defect detection in Scanning Electron Microscope (SEM) images, performing root cause analysis, and providing expert Q&A on fabrication processes. FabGPT matches enhanced multimodal features to automatically detect minute defects under complex wafer backgrounds and reduce the subjectivity of manual threshold settings. Besides, the proposed modulation module and interactive corpus training strategy embed wafer defect knowledge into the pre-trained model, effectively balancing Q&A queries related to defect knowledge and original knowledge and mitigating the modality bias issues. Experiments on in-house fab data show that FabGPT achieves significant performance improvement in wafer defect detection and knowledge querying.
Paper Structure (17 sections, 13 equations, 7 figures, 5 tables)

This paper contains 17 sections, 13 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Comparisons between our FabGPT and GPT-4 achiam2023gpt, AnomalyGPT gu2023anomalygpt, which are fine-tuned on our dataset, for detecting, locating, and analyzing microscopic defects in complex backgrounds and addressing modality bias issues. Previous arts perform badly while encountering "detection", "analysis", and "modality bias".
  • Figure 2: The four types of defects and defect-free (good) images in the SEM-WaD dataset.
  • Figure 3: The architecture of FabGPT. The images and the characters extracted from them serve as the primary multimodal input into a three-stage model, with the label set entering as auxiliary textual input. The first stage enhances semantic information of multimodal features. Based on the first stage, the detection stage performs pixel-level automated detection, and the Q&A stage achieves complete Q&A integrating both old and new knowledge.
  • Figure 4: (a) The Prediction Module (PM), (b) The Modulation Module.
  • Figure 5: Comparisons with non-LMM baselines.
  • ...and 2 more figures