Iterative Cluster Harvesting for Wafer Map Defect Patterns
Alina Pleli, Simon Baeuerle, Michel Janus, Jonas Barth, Ralf Mikut, Hendrik P. A. Lensch
TL;DR
The paper introduces Iterative Cluster Harvesting (ICH), an unsupervised clustering framework for wafer map defect patterns that iteratively refines the feature space by alternating feature extraction, PCA-based dimensionality reduction, and agglomerative clustering, while harvesting the most separable cluster per iteration via silhouette scores. This approach redefines the data set at each step, yielding more homogeneous clusters than one-shot clustering and providing a practical aid for labeling and root-cause analysis. Evaluations on WM1K show substantial improvements in clustering homogeneity over baselines, with additional insights from WM811K_sub highlighting limitations when full assignment is enforced on difficult samples. The method is modular, scalable, and adaptable to other image domains, offering a simple yet effective tool to reveal defect patterns and support manual review.
Abstract
Unsupervised clustering of wafer map defect patterns is challenging because the appearance of certain defect patterns varies significantly. This includes changing shape, location, density, and rotation of the defect area on the wafer. We present a harvesting approach, which can cluster even challenging defect patterns of wafer maps well. Our approach makes use of a well-known, three-step procedure: feature extraction, dimension reduction, and clustering. The novelty in our approach lies in repeating dimensionality reduction and clustering iteratively while filtering out one cluster per iteration according to its silhouette score. This method leads to an improvement of clustering performance in general and is especially useful for difficult defect patterns. The low computational effort allows for a quick assessment of large datasets and can be used to support manual labeling efforts. We benchmark against related approaches from the literature and show improved results on a real-world industrial dataset.
