Iterative Cluster Harvesting for Wafer Map Defect Patterns

Alina Pleli; Simon Baeuerle; Michel Janus; Jonas Barth; Ralf Mikut; Hendrik P. A. Lensch

Iterative Cluster Harvesting for Wafer Map Defect Patterns

Alina Pleli, Simon Baeuerle, Michel Janus, Jonas Barth, Ralf Mikut, Hendrik P. A. Lensch

TL;DR

The paper introduces Iterative Cluster Harvesting (ICH), an unsupervised clustering framework for wafer map defect patterns that iteratively refines the feature space by alternating feature extraction, PCA-based dimensionality reduction, and agglomerative clustering, while harvesting the most separable cluster per iteration via silhouette scores. This approach redefines the data set at each step, yielding more homogeneous clusters than one-shot clustering and providing a practical aid for labeling and root-cause analysis. Evaluations on WM1K show substantial improvements in clustering homogeneity over baselines, with additional insights from WM811K_sub highlighting limitations when full assignment is enforced on difficult samples. The method is modular, scalable, and adaptable to other image domains, offering a simple yet effective tool to reveal defect patterns and support manual review.

Abstract

Unsupervised clustering of wafer map defect patterns is challenging because the appearance of certain defect patterns varies significantly. This includes changing shape, location, density, and rotation of the defect area on the wafer. We present a harvesting approach, which can cluster even challenging defect patterns of wafer maps well. Our approach makes use of a well-known, three-step procedure: feature extraction, dimension reduction, and clustering. The novelty in our approach lies in repeating dimensionality reduction and clustering iteratively while filtering out one cluster per iteration according to its silhouette score. This method leads to an improvement of clustering performance in general and is especially useful for difficult defect patterns. The low computational effort allows for a quick assessment of large datasets and can be used to support manual labeling efforts. We benchmark against related approaches from the literature and show improved results on a real-world industrial dataset.

Iterative Cluster Harvesting for Wafer Map Defect Patterns

TL;DR

Abstract

Paper Structure (26 sections, 5 equations, 8 figures, 4 tables)

This paper contains 26 sections, 5 equations, 8 figures, 4 tables.

Introduction
Related Work
Methodology
Three-step clustering
Feature extraction
Dimensionality reduction
Clustering
Harvesting clusters with silhouette score
Summary and extensions of method
Refining $H_{\text{cluster}}$ based on size
Full assignment of wafer maps to $H_{\text{cluster}}$
Free Parameters of the full method
Evaluation metric
Experimental setup
Method setup
...and 11 more sections

Figures (8)

Figure 1: Iterative cluster harvesting scheme: CNN feature vectors $F_D$ are extracted for the dataset $D$ of wafer maps, and are processed with a dimensional reduction (PCA in our case) and clustered (using AC in our case). Per iterative run, the cluster with the highest value for the silhouette score is removed from $F_D$. Therefore in the subsequent iteration, the variances in the remaining dataset will lead to changes in the definitions of the PCA components. This allows our method to further separate the remaining dataset and find more homogeneous clusters over successive iterations.
Figure 2: Exemplary images of wafer maps in the dataset WM1K with (a) Center, (b) Donut, (c) Edge-Loc, (d) Loc, (e) Near-Full, (f) Random, (g) Ring and (h) Scratch defect pattern.
Figure 3: Confusion matrices of OTC (a) and ICH (b) approach with full assignment of WM1K dataset. For each cluster, the predominant true label is evaluated and assigned as the predicted label. If there are other true labels in a predominant cluster of a particular true label, this can be read along the vertical column of the true label. Values are normalized to the number of data points of the true labels. Since there are usually multiple clusters for the same label, they are combined in order to calculate the confusion matrix. The number of individual clusters per true label is shown in the top row of the respective confusion matrix.
Figure 4: Histogram over the feature values of the first seven principal components after applying the OTC to WM1K, which is equal to the first iteration of the ICH procedure. Colors are assigned according to the true labels. Mean values of an exemplary cluster (a) shown in \ref{['fig:Good_clusters_Iterative_Donut']} are marked in each principal component.
Figure 5: (a): Mean image of the Donut cluster identified in the 1st iteration by ICH for WM1K and (b): mean image of all wafer maps of WM1K from the Donut class except for the wafer maps from the identified cluster (a).
...and 3 more figures

Iterative Cluster Harvesting for Wafer Map Defect Patterns

TL;DR

Abstract

Iterative Cluster Harvesting for Wafer Map Defect Patterns

Authors

TL;DR

Abstract

Table of Contents

Figures (8)