Combining unsupervised and supervised learning in microscopy enables defect analysis of a full 4H-SiC wafer
Binh Duong Nguyen, Johannes Steiner, Peter Wellmann, Stefan Sandfeld
TL;DR
The paper tackles the challenge of high-throughput defect analysis in 4H-SiC wafers by presenting an automated pipeline that integrates classical image processing, supervised classification, and unsupervised clustering to detect and localize etch-pit dislocations across a full wafer stitched from ~40,000 images. It introduces an automated etch-pit dictionary built from single-pit examples, using a binary classifier and dimensionality-reduction-based clustering (VGG-16 features, UMAP, HDBSCAN) to identify dislocation types. For wafer-scale analysis, semi-synthetic training data generated from the dictionary enables Mask R-CNN-based instance segmentation to classify and locate dislocations with average accuracy around 0.92 in cross-validation. The approach yields a comprehensive, high-throughput dislocation map, revealing spatial patterns and enabling defect-controlled optimization of SiC growth processes, scalable to large-diameter wafers. It also highlights limitations due to mixed dislocations and registration artifacts, pointing to avenues for refinement and further automation.
Abstract
Detecting and analyzing various defect types in semiconductor materials is an important prerequisite for understanding the underlying mechanisms as well as tailoring the production processes. Analysis of microscopy images that reveal defects typically requires image analysis tasks such as segmentation and object detection. With the permanently increasing amount of data that is produced by experiments, handling these tasks manually becomes more and more impossible. In this work, we combine various image analysis and data mining techniques for creating a robust and accurate, automated image analysis pipeline. This allows for extracting the type and position of all defects in a microscopy image of a KOH-etched 4H-SiC wafer that was stitched together from approximately 40,000 individual images.
