Rare Class Prediction Model for Smart Industry in Semiconductor Manufacturing
Abdelrahman Farrag, Mohammed-Khalil Ghali, Yu Jin
TL;DR
This work tackles rare-class prediction in smart semiconductor manufacturing, where data are plagued by imbalance, missing values, and noisy features. It introduces a voting-based rare-class feature selection framework combined with careful imputation and resampling, evaluated on the SECOM dataset. The results show that using XGBoost with SMOTE and a rare-class feature voting scheme achieves high discriminative power, with an AUC of $0.95$ and a recall of $0.96$, indicating strong capability to detect defects for maintenance and quality improvements. The approach provides a practical, data-driven path to robust predictive maintenance in complex manufacturing environments and points to future enhancements via data augmentation using generative AI.
Abstract
The evolution of industry has enabled the integration of physical and digital systems, facilitating the collection of extensive data on manufacturing processes. This integration provides a reliable solution for improving process quality and managing equipment health. However, data collected from real manufacturing processes often exhibit challenging properties, such as severe class imbalance, high rates of missing values, and noisy features, which hinder effective machine learning implementation. In this study, a rare class prediction approach is developed for in situ data collected from a smart semiconductor manufacturing process. The primary objective is to build a model that addresses issues of noise and class imbalance, enhancing class separation. The developed approach demonstrated promising results compared to existing literature, which would allow the prediction of new observations that could give insights into future maintenance plans and production quality. The model was evaluated using various performance metrics, with ROC curves showing an AUC of 0.95, a precision of 0.66, and a recall of 0.96
