Table of Contents
Fetching ...

Addressing Class Imbalance and Data Limitations in Advanced Node Semiconductor Defect Inspection: A Generative Approach for SEM Images

Bappaditya Dey, Vic De Ridder, Victor Blanco, Sandip Halder, Bartel Van Waeyenberge

TL;DR

The paper tackles the problem of data scarcity and class imbalance in advanced-node semiconductor defect inspection by introducing a patch-based diffusion framework (DDPM) to generate defect-containing SEM images. It trains a class-conditional diffusion model on small patches and uses inpainting to assemble full-size, defect-containing images that preserve realistic noise and metrology characteristics without metadata supervision. The approach is validated on three real SEM datasets (HEXCH-DSA, LS-ADI, LS-AEI), showing that detectors trained on synthetic or mixed data achieve comparable or improved performance on real-test data, and demonstrating defect-type transfer across different CD/Pitch and metrology specifications. The work suggests that diffusion-based synthetic data can effectively augment scarce, expensive SEM datasets, reduce overfitting, and enable cross-process defect preparedness with practical impact for industrial defect inspection pipelines.

Abstract

Precision in identifying nanometer-scale device-killer defects is crucial in both semiconductor research and development as well as in production processes. The effectiveness of existing ML-based approaches in this context is largely limited by the scarcity of data, as the production of real semiconductor wafer data for training these models involves high financial and time costs. Moreover, the existing simulation methods fall short of replicating images with identical noise characteristics, surface roughness and stochastic variations at advanced nodes. We propose a method for generating synthetic semiconductor SEM images using a diffusion model within a limited data regime. In contrast to images generated through conventional simulation methods, SEM images generated through our proposed DL method closely resemble real SEM images, replicating their noise characteristics and surface roughness adaptively. Our main contributions, which are validated on three different real semiconductor datasets, are: i) proposing a patch-based generative framework utilizing DDPM to create SEM images with intended defect classes, addressing challenges related to class-imbalance and data insufficiency, ii) demonstrating generated synthetic images closely resemble real SEM images acquired from the tool, preserving all imaging conditions and metrology characteristics without any metadata supervision, iii) demonstrating a defect detector trained on generated defect dataset, either independently or combined with a limited real dataset, can achieve similar or improved performance on real wafer SEM images during validation/testing compared to exclusive training on a real defect dataset, iv) demonstrating the ability of the proposed approach to transfer defect types, critical dimensions, and imaging conditions from one specified CD/Pitch and metrology specifications to another, thereby highlighting its versatility.

Addressing Class Imbalance and Data Limitations in Advanced Node Semiconductor Defect Inspection: A Generative Approach for SEM Images

TL;DR

The paper tackles the problem of data scarcity and class imbalance in advanced-node semiconductor defect inspection by introducing a patch-based diffusion framework (DDPM) to generate defect-containing SEM images. It trains a class-conditional diffusion model on small patches and uses inpainting to assemble full-size, defect-containing images that preserve realistic noise and metrology characteristics without metadata supervision. The approach is validated on three real SEM datasets (HEXCH-DSA, LS-ADI, LS-AEI), showing that detectors trained on synthetic or mixed data achieve comparable or improved performance on real-test data, and demonstrating defect-type transfer across different CD/Pitch and metrology specifications. The work suggests that diffusion-based synthetic data can effectively augment scarce, expensive SEM datasets, reduce overfitting, and enable cross-process defect preparedness with practical impact for industrial defect inspection pipelines.

Abstract

Precision in identifying nanometer-scale device-killer defects is crucial in both semiconductor research and development as well as in production processes. The effectiveness of existing ML-based approaches in this context is largely limited by the scarcity of data, as the production of real semiconductor wafer data for training these models involves high financial and time costs. Moreover, the existing simulation methods fall short of replicating images with identical noise characteristics, surface roughness and stochastic variations at advanced nodes. We propose a method for generating synthetic semiconductor SEM images using a diffusion model within a limited data regime. In contrast to images generated through conventional simulation methods, SEM images generated through our proposed DL method closely resemble real SEM images, replicating their noise characteristics and surface roughness adaptively. Our main contributions, which are validated on three different real semiconductor datasets, are: i) proposing a patch-based generative framework utilizing DDPM to create SEM images with intended defect classes, addressing challenges related to class-imbalance and data insufficiency, ii) demonstrating generated synthetic images closely resemble real SEM images acquired from the tool, preserving all imaging conditions and metrology characteristics without any metadata supervision, iii) demonstrating a defect detector trained on generated defect dataset, either independently or combined with a limited real dataset, can achieve similar or improved performance on real wafer SEM images during validation/testing compared to exclusive training on a real defect dataset, iv) demonstrating the ability of the proposed approach to transfer defect types, critical dimensions, and imaging conditions from one specified CD/Pitch and metrology specifications to another, thereby highlighting its versatility.
Paper Structure (14 sections, 23 figures, 5 tables)

This paper contains 14 sections, 23 figures, 5 tables.

Figures (23)

  • Figure 1: Representative sample SEM images illustrating example defect types in the datasets used in this study
  • Figure 2: High level overview of the proposed approach towards synthetic SEM image dataset generation containing multi defect types.
  • Figure 3: Proposed approach to generate full-size, defect-free SEM image (archetype) using patch based method
  • Figure 4: Comparison between (a) real SEM image, (b) image generated with software simulation, and (c) image generated with our proposed method.
  • Figure 5: Linescan plot for generated (using our proposed approach) and real SEM image
  • ...and 18 more figures