Prognostics and Health Management of Wafer Chemical-Mechanical Polishing System using Autoencoder
Kart-Leong Lim, Rahul Dutta
TL;DR
This work addresses predicting wafer material removal rate ($MRR$) in CMP manufacturing using the PHM 2016 dataset, where ground-truth labels for supervised learning are scarce. It introduces an autoencoder trained with an autoencoder-based clustering (ABC) loss to produce a latent space aligned with cluster structure, enhancing linear regression performance in the latent space. The study compares two clustering strategies—Kmeans and infinite Gaussian mixture models (iGMM)—against baselines based on statistical moments, raw/concatenated time-series, and PCA, reporting that the ABC approach improves RMSE, with the best configuration achieving competitive results (e.g., RMSE around 4.77). The findings indicate that deep clustering for regression can yield practical benefits in industrial PHM tasks, potentially reducing production costs through better wear prediction and forecasting.
Abstract
The Prognostics and Health Management Data Challenge (PHM) 2016 tracks the health state of components of a semiconductor wafer polishing process. The ultimate goal is to develop an ability to predict the measurement on the wafer surface wear through monitoring the components health state. This translates to cost saving in large scale production. The PHM dataset contains many time series measurements not utilized by traditional physics based approach. On the other hand task, applying a data driven approach such as deep learning to the PHM dataset is non-trivial. The main issue with supervised deep learning is that class label is not available to the PHM dataset. Second, the feature space trained by an unsupervised deep learner is not specifically targeted at the predictive ability or regression. In this work, we propose using the autoencoder based clustering whereby the feature space trained is found to be more suitable for performing regression. This is due to having a more compact distribution of samples respective to their nearest cluster means. We justify our claims by comparing the performance of our proposed method on the PHM dataset with several baselines such as the autoencoder as well as state-of-the-art approaches.
