Table of Contents
Fetching ...

A Physics-Constrained, Design-Driven Methodology for Defect Dataset Generation in Optical Lithography

Yuehua Hu, Jiyeong Kong, Dong-yeol Shin, Jaekyun Kim, Kyung-Tae Kang

TL;DR

This work tackles the lack of physically grounded lithography defect data by introducing a design-driven, physics-constrained pipeline that synthesizes defect layouts via Minkowski erosion/dilation, maps them to printed contours through a DMD-based lithography system, and generates pixel-accurate ground-truth masks. The approach yields a large, physically meaningful dataset (3,530 images, 13,365 defects across four classes) and demonstrates that instance-segmentation methods (notably Mask R-CNN) outperform bounding-box detectors on this task, with substantial AP@0.5 gains. The methodology links digital perturbations to real-world lithographic outcomes, enabling robust AI-based inspection and scalable defense against process variability, while offering a path to broader defect types and process contexts.

Abstract

The efficacy of Artificial Intelligence (AI) in micro/nano manufacturing is fundamentally constrained by the scarcity of high-quality and physically grounded training data for defect inspection. Lithography defect data from semiconductor industry are rarely accessible for research use, resulting in a shortage of publicly available datasets. To address this bottleneck in lithography, this study proposes a novel methodology for generating large-scale, physically valid defect datasets with pixel-level annotations. The framework begins with the ab initio synthesis of defect layouts using controllable, physics-constrained mathematical morphology operations (erosion and dilation) applied to the original design-level layout. These synthesized layouts, together with their defect-free counterparts, are fabricated into physical samples via high-fidelity digital micromirror device (DMD)-based lithography. Optical micrographs of the synthesized defect samples and their defect-free references are then compared to create consistent defect delineation annotations. Using this methodology, we constructed a comprehensive dataset of 3,530 Optical micrographs containing 13,365 annotated defect instances including four classes: bridge, burr, pinch, and contamination. Each defect instance is annotated with a pixel-accurate segmentation mask, preserving full contour and geometry. The segmentation-based Mask R-CNN achieves AP@0.5 of 0.980, 0.965, and 0.971, compared with 0.740, 0.719, and 0.717 for Faster R-CNN on bridge, burr, and pinch classes, representing a mean AP@0.5 improvement of approximately 34%. For the contamination class, Mask R-CNN achieves an AP@0.5 roughly 42% higher than Faster R-CNN. These consistent gains demonstrate that our proposed methodology to generate defect datasets with pixel-level annotations is feasible for robust AI-based Measurement/Inspection (MI) in semiconductor fabrication.

A Physics-Constrained, Design-Driven Methodology for Defect Dataset Generation in Optical Lithography

TL;DR

This work tackles the lack of physically grounded lithography defect data by introducing a design-driven, physics-constrained pipeline that synthesizes defect layouts via Minkowski erosion/dilation, maps them to printed contours through a DMD-based lithography system, and generates pixel-accurate ground-truth masks. The approach yields a large, physically meaningful dataset (3,530 images, 13,365 defects across four classes) and demonstrates that instance-segmentation methods (notably Mask R-CNN) outperform bounding-box detectors on this task, with substantial AP@0.5 gains. The methodology links digital perturbations to real-world lithographic outcomes, enabling robust AI-based inspection and scalable defense against process variability, while offering a path to broader defect types and process contexts.

Abstract

The efficacy of Artificial Intelligence (AI) in micro/nano manufacturing is fundamentally constrained by the scarcity of high-quality and physically grounded training data for defect inspection. Lithography defect data from semiconductor industry are rarely accessible for research use, resulting in a shortage of publicly available datasets. To address this bottleneck in lithography, this study proposes a novel methodology for generating large-scale, physically valid defect datasets with pixel-level annotations. The framework begins with the ab initio synthesis of defect layouts using controllable, physics-constrained mathematical morphology operations (erosion and dilation) applied to the original design-level layout. These synthesized layouts, together with their defect-free counterparts, are fabricated into physical samples via high-fidelity digital micromirror device (DMD)-based lithography. Optical micrographs of the synthesized defect samples and their defect-free references are then compared to create consistent defect delineation annotations. Using this methodology, we constructed a comprehensive dataset of 3,530 Optical micrographs containing 13,365 annotated defect instances including four classes: bridge, burr, pinch, and contamination. Each defect instance is annotated with a pixel-accurate segmentation mask, preserving full contour and geometry. The segmentation-based Mask R-CNN achieves AP@0.5 of 0.980, 0.965, and 0.971, compared with 0.740, 0.719, and 0.717 for Faster R-CNN on bridge, burr, and pinch classes, representing a mean AP@0.5 improvement of approximately 34%. For the contamination class, Mask R-CNN achieves an AP@0.5 roughly 42% higher than Faster R-CNN. These consistent gains demonstrate that our proposed methodology to generate defect datasets with pixel-level annotations is feasible for robust AI-based Measurement/Inspection (MI) in semiconductor fabrication.

Paper Structure

This paper contains 8 sections, 8 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Schematic of the methodology for generating a physically grounded defect dataset. (a) Generation of design-level data, comprising 25 unique raw (defect-free) layouts and their corresponding defect-injected counterparts. Defects are systematically introduced via physics-constrained dilation and erosion operations. (b) High-fidelity physical replication of the designed layouts using DMD-based lithography. The process involves spin-coating AZ nLOF 2035 negative photoresist on a substrate, followed by DMD exposure, development, and a final baking step. (c) Ground truth generation and model training pipeline. Optical micrographs of the fabricated patterns are collected, and pixel-level ground truth is obtained by comparing the defect and raw layout images. This labeled dataset is then used to train an AI-based defect detection model.
  • Figure 2: Conceptual framework for defect synthesis, physical manifestation, and classification. (a) Design-level synthesis. A defect-free raw layout $A$ is transformed into various defect layouts $A'$ through mathematical morphology operations (Minkowski erosion $\ominus$ or dilation $\oplus$) using a designated structuring element (SE). (b) Physical replication. The layouts $A$ and $A'$ are fabricated on the substrate using the lithography process, yielding a defect-free image $B$ and corresponding defect images $B'$. The designed perturbations are physically amplified by the Mask Error Enhancement Factor (MEEF), producing distinct defect images $B'$ (bridge, burr, pinch) alongside the defect-free raw image $B$. (c) Ground-truth annotation. Image differencing between $B$ and $B'$, combined with topological and morphological analysis (evaluation of $\Delta k = k(A') - k(A)$ and contour deformation), provides objective cues for consistent pixel-level annotation, resulting in labeled images $C'$.
  • Figure 3: Layout library and defect injection strategy. (a) Defect-free base layout library comprising 5 horizontal-line layouts, 5 vertical-line layouts, and 15 ICCAD-inspired composite layouts (25 base layouts in total, all defined on a $128 \times 128$-pixel grid). (b) Morphology-based defect injection scheme. For each base layout, 150 defect variants are synthesized using three defect groups: 50 bridge defects generated by dilation with a square structuring element (SE), 50 pinch defects generated by erosion with a square SE, and 50 pinch defects generated by erosion with a diamond-shaped SE.
  • Figure 4: Examples across the design-to-image pipeline. (1a-1d) Bridge/Burr; (2a-2d) Pinch with a square Structuring Element (SE); (3a-3d) Pinch with a diamond SE. Within each row, panels correspond to: (a) raw layout (defect-free binary pattern); (b) SE-induced defect layout, where the boundary perturbation is defined by Eq. (2); (c) optical micrograph acquired under standardized illumination and magnification, showing deviations consistent with Mask Error Enhancement Factor (MEEF)-amplified perturbations; (d) pixel-level mask overlay with class-consistent color mapping. Equal magnification and fixed scale bars are used throughout
  • Figure 5: Dataset statistics. (a) Spatial density map of annotated instances showing a slight central positional bias. (b) Distribution of instance mask size (percentage of total image size). Statistics computed on the training dataset under the preprocessing conditions described in Methods.
  • ...and 1 more figures