Table of Contents
Fetching ...

Combining unsupervised and supervised learning in microscopy enables defect analysis of a full 4H-SiC wafer

Binh Duong Nguyen, Johannes Steiner, Peter Wellmann, Stefan Sandfeld

TL;DR

The paper tackles the challenge of high-throughput defect analysis in 4H-SiC wafers by presenting an automated pipeline that integrates classical image processing, supervised classification, and unsupervised clustering to detect and localize etch-pit dislocations across a full wafer stitched from ~40,000 images. It introduces an automated etch-pit dictionary built from single-pit examples, using a binary classifier and dimensionality-reduction-based clustering (VGG-16 features, UMAP, HDBSCAN) to identify dislocation types. For wafer-scale analysis, semi-synthetic training data generated from the dictionary enables Mask R-CNN-based instance segmentation to classify and locate dislocations with average accuracy around 0.92 in cross-validation. The approach yields a comprehensive, high-throughput dislocation map, revealing spatial patterns and enabling defect-controlled optimization of SiC growth processes, scalable to large-diameter wafers. It also highlights limitations due to mixed dislocations and registration artifacts, pointing to avenues for refinement and further automation.

Abstract

Detecting and analyzing various defect types in semiconductor materials is an important prerequisite for understanding the underlying mechanisms as well as tailoring the production processes. Analysis of microscopy images that reveal defects typically requires image analysis tasks such as segmentation and object detection. With the permanently increasing amount of data that is produced by experiments, handling these tasks manually becomes more and more impossible. In this work, we combine various image analysis and data mining techniques for creating a robust and accurate, automated image analysis pipeline. This allows for extracting the type and position of all defects in a microscopy image of a KOH-etched 4H-SiC wafer that was stitched together from approximately 40,000 individual images.

Combining unsupervised and supervised learning in microscopy enables defect analysis of a full 4H-SiC wafer

TL;DR

The paper tackles the challenge of high-throughput defect analysis in 4H-SiC wafers by presenting an automated pipeline that integrates classical image processing, supervised classification, and unsupervised clustering to detect and localize etch-pit dislocations across a full wafer stitched from ~40,000 images. It introduces an automated etch-pit dictionary built from single-pit examples, using a binary classifier and dimensionality-reduction-based clustering (VGG-16 features, UMAP, HDBSCAN) to identify dislocation types. For wafer-scale analysis, semi-synthetic training data generated from the dictionary enables Mask R-CNN-based instance segmentation to classify and locate dislocations with average accuracy around 0.92 in cross-validation. The approach yields a comprehensive, high-throughput dislocation map, revealing spatial patterns and enabling defect-controlled optimization of SiC growth processes, scalable to large-diameter wafers. It also highlights limitations due to mixed dislocations and registration artifacts, pointing to avenues for refinement and further automation.

Abstract

Detecting and analyzing various defect types in semiconductor materials is an important prerequisite for understanding the underlying mechanisms as well as tailoring the production processes. Analysis of microscopy images that reveal defects typically requires image analysis tasks such as segmentation and object detection. With the permanently increasing amount of data that is produced by experiments, handling these tasks manually becomes more and more impossible. In this work, we combine various image analysis and data mining techniques for creating a robust and accurate, automated image analysis pipeline. This allows for extracting the type and position of all defects in a microscopy image of a KOH-etched 4H-SiC wafer that was stitched together from approximately 40,000 individual images.
Paper Structure (19 sections, 1 equation, 15 figures)

This paper contains 19 sections, 1 equation, 15 figures.

Figures (15)

  • Figure 1: Micrograph of the investigated SiC waver. The magnified region shows dislocation lines piercing the surface, revealed by KOH etching. The whole wafer image consist of altogether $40,000$ images, one of them is shown in sub-figure c).
  • Figure 2: Data analysis pipeline of the two main tasks: the automated creation of a etch-pit dictionary pool (top box) and predicting dislocations in the full wafer (bottom box). Further explanations are given in the text.
  • Figure 3: Left: Visualization of all three components after clustering. Right: Magnification of the data projected on the first two components (i.e., the middle plot in the leftmost column of the left figure) along with examples of the etch pit images. The value ranges of the three latent space components are the same, and therefore, no scales are given.
  • Figure 4: Results comparison between ground truth and predicted segmentation. a and d: grayscale images of synthetic low and high dislocation density; g: real microscopy image; b and e: ground truth segmentation from the synthetic image; h: ground truth segmentation from hand labelling; c, f and i: predicted results from the deep learning.
  • Figure 5: Differences between ground truth and prediction on various test datasets: $1000$ images of high dislocation density, $1000$ images of low dislocation density and $100$ images of real dislocation density.
  • ...and 10 more figures