Utilising Explainable Techniques for Quality Prediction in a Complex Textiles Manufacturing Use Case
Briony Forsberg, Dr Henry Williams, Prof Bruce MacDonald, Tracy Chen, Dr Reza Hamzeh, Dr Kirstine Hulse
TL;DR
The paper addresses root cause analysis for colour quality in a complex textiles manufacturing setting. It compares Decision Tree, Random Forest, and XGBoost classifiers using three feature-selection methods, finding that Random Forest with Boruta feature selection yields the best balance of predictive performance and interpretability, with TE2Rules providing human-readable rule lists for post-hoc explanations. Engineered features such as hue, colour depth, and min/max statistics significantly improve predictive power, enabling more actionable RCA insights. The findings demonstrate practical potential for early process intervention in wool textile production and pave the way for extending to additional products and downgrade scenarios, while highlighting the trade-offs between recall and precision in industrial decision-making; Recall and Precision are defined as Recall = $\frac{TP}{TP + FN}$, Precision = $\frac{TP}{TP + FP}$, and F1 = $\frac{2 \cdot Recall \cdot Precision}{Recall + Precision}$.
Abstract
This paper develops an approach to classify instances of product failure in a complex textiles manufacturing dataset using explainable techniques. The dataset used in this study was obtained from a New Zealand manufacturer of woollen carpets and rugs. In investigating the trade-off between accuracy and explainability, three different tree-based classification algorithms were evaluated: a Decision Tree and two ensemble methods, Random Forest and XGBoost. Additionally, three feature selection methods were also evaluated: the SelectKBest method, using chi-squared as the scoring function, the Pearson Correlation Coefficient, and the Boruta algorithm. Not surprisingly, the ensemble methods typically produced better results than the Decision Tree model. The Random Forest model yielded the best results overall when combined with the Boruta feature selection technique. Finally, a tree ensemble explaining technique was used to extract rule lists to capture necessary and sufficient conditions for classification by a trained model that could be easily interpreted by a human. Notably, several features that were in the extracted rule lists were statistical features and calculated features that were added to the original dataset. This demonstrates the influence that bringing in additional information during the data preprocessing stages can have on the ultimate model performance.
