Generative Pre-Training of Time-Series Data for Unsupervised Fault Detection in Semiconductor Manufacturing
Sewoong Lee, JinKyou Choi, Min Su Kim
TL;DR
TRACE-GPT tackles the challenge of unsupervised fault detection in semiconductor manufacturing where labeled anomalies are scarce and data are mixed. It integrates a Temporal Convolutional Network for robust temporal embedding with a Transformer decoder that performs next-value prediction in a task-agnostic pre-training objective, allowing effective anomaly scoring and wafer-level classification without labels. The approach is evaluated on CVD process logs and the UCR open dataset, demonstrating superior or near-supervised performance (F1 at EER on par with state-of-the-art) and favorable runtimes. This work offers a practical, scalable method for real-time fault detection in manufacturing with strong data-efficiency and explainability through attention visualization.
Abstract
This paper introduces TRACE-GPT, which stands for Time-seRies Anomaly-detection with Convolutional Embedding and Generative Pre-trained Transformers. TRACE-GPT is designed to pre-train univariate time-series sensor data and detect faults on unlabeled datasets in semiconductor manufacturing. In semiconductor industry, classifying abnormal time-series sensor data from normal data is important because it is directly related to wafer defect. However, small, unlabeled, and even mixed training data without enough anomalies make classification tasks difficult. In this research, we capture features of time-series data with temporal convolutional embedding and Generative Pre-trained Transformer (GPT) to classify abnormal sequences from normal sequences using cross entropy loss. We prove that our model shows better performance than previous unsupervised models with both an open dataset, the University of California Riverside (UCR) time-series classification archive, and the process log of our Chemical Vapor Deposition (CVD) equipment. Our model has the highest F1 score at Equal Error Rate (EER) across all datasets and is only 0.026 below the supervised state-of-the-art baseline on the open dataset.
