Generative Pre-Training of Time-Series Data for Unsupervised Fault Detection in Semiconductor Manufacturing

Sewoong Lee; JinKyou Choi; Min Su Kim

Generative Pre-Training of Time-Series Data for Unsupervised Fault Detection in Semiconductor Manufacturing

Sewoong Lee, JinKyou Choi, Min Su Kim

TL;DR

TRACE-GPT tackles the challenge of unsupervised fault detection in semiconductor manufacturing where labeled anomalies are scarce and data are mixed. It integrates a Temporal Convolutional Network for robust temporal embedding with a Transformer decoder that performs next-value prediction in a task-agnostic pre-training objective, allowing effective anomaly scoring and wafer-level classification without labels. The approach is evaluated on CVD process logs and the UCR open dataset, demonstrating superior or near-supervised performance (F1 at EER on par with state-of-the-art) and favorable runtimes. This work offers a practical, scalable method for real-time fault detection in manufacturing with strong data-efficiency and explainability through attention visualization.

Abstract

This paper introduces TRACE-GPT, which stands for Time-seRies Anomaly-detection with Convolutional Embedding and Generative Pre-trained Transformers. TRACE-GPT is designed to pre-train univariate time-series sensor data and detect faults on unlabeled datasets in semiconductor manufacturing. In semiconductor industry, classifying abnormal time-series sensor data from normal data is important because it is directly related to wafer defect. However, small, unlabeled, and even mixed training data without enough anomalies make classification tasks difficult. In this research, we capture features of time-series data with temporal convolutional embedding and Generative Pre-trained Transformer (GPT) to classify abnormal sequences from normal sequences using cross entropy loss. We prove that our model shows better performance than previous unsupervised models with both an open dataset, the University of California Riverside (UCR) time-series classification archive, and the process log of our Chemical Vapor Deposition (CVD) equipment. Our model has the highest F1 score at Equal Error Rate (EER) across all datasets and is only 0.026 below the supervised state-of-the-art baseline on the open dataset.

Generative Pre-Training of Time-Series Data for Unsupervised Fault Detection in Semiconductor Manufacturing

TL;DR

Abstract

Paper Structure (19 sections, 6 equations, 3 figures, 3 tables, 1 algorithm)

This paper contains 19 sections, 6 equations, 3 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Unsupervised Time-series Anomaly Detection
Generative Pre-trained Transformer
TRACE-GPT
Positional Embedding (PE)
Temporal Convolutional Networks (TCN)
Transformer
Applications of Our Model
Data
CVD Equipment Process Log
The UCR Time-Series Classification Archive
Experimental Results
Experimental Setup
Baseline Models
...and 4 more sections

Figures (3)

Figure 1: Resulting ROC curves of the proposed TRACE-GPT model. Corresponding AUC values are given in the legend.
Figure 2: Distribution of the cross-entropy loss ($\mathcal{L}$) on the CVD dataset. This histogram shows that test loss has converged, even with the small number of training data. The losses from faults are higher than those from normal sequences. Unlike the UCR dataset, since all fault types are clearly identified, it is possible to compare the loss among different fault types. Peripheral point was the most challenging fault type to classify.
Figure 3: Figures (a) and (b) provide examples of how our model and baseline models computed anomaly scores over time for each dataset's given raw data. The ground truth for anomalies (highlighted in red) was confirmed by domain experts in the semiconductor manufacturing process. High anomaly scores should appear during periods of anomalies, while remaining relatively low for the rest. Figures (c) and (d) represent visualizations of our TRACE-GPT model for the corresponding data. As our model utilizes attention mechanisms, it shows how the eight attention heads learned weights and, based on this, how the model predicts sensor values, visualized through blue heatmap on the background. The red line chart corresponds to the original raw data for the same original time-series as in (a) and (b). The sensor values and anomaly scores have been normalized between 0 and 1.

Generative Pre-Training of Time-Series Data for Unsupervised Fault Detection in Semiconductor Manufacturing

TL;DR

Abstract

Generative Pre-Training of Time-Series Data for Unsupervised Fault Detection in Semiconductor Manufacturing

Authors

TL;DR

Abstract

Table of Contents

Figures (3)