A Hybrid Framework for Statistical Feature Selection and Image-Based Noise-Defect Detection
Alejandro Garnung Menéndez
TL;DR
This work tackles robust defect detection in highly noisy industrial imagery by distinguishing surface defects from noise using a hybrid framework that combines statistical feature selection with a broad set of handcrafted features. It generates ROI scores through a two-stage process: (i) extensive feature extraction across spatial, texture, distributional, and spectral domains, and (ii) rigorous statistical feature selection (Fisher criterion, KS test, t-test, Bhattacharyya distance) to retain discriminative features, potentially aggregating them with simple scoring or a random forest. A diverse toolbox—including GMMs, patch-based tests, IQR-based outlier detection, CC analysis, Gabor/LBP/HOG/Homomorphic/GLE textures, and median-histogram modeling—drives robust TP/FP separation while aiming to minimize false positives in challenging, noisy environments. The framework is designed to function as a black-box module on top of existing classifiers or as a standalone assessment unit, offering real-time applicability and flexible integration with industrial inspection pipelines.
Abstract
In industrial imaging, accurately detecting and distinguishing surface defects from noise is critical and challenging, particularly in complex environments with noisy data. This paper presents a hybrid framework that integrates both statistical feature selection and classification techniques to improve defect detection accuracy while minimizing false positives. The motivation of the system is based on the generation of scalar scores that represent the likelihood that a region of interest (ROI) is classified as a defect or noise. We present around 55 distinguished features that are extracted from industrial images, which are then analyzed using statistical methods such as Fisher separation, chi-squared test, and variance analysis. These techniques identify the most discriminative features, focusing on maximizing the separation between true defects and noise. Fisher's criterion ensures robust, real-time performance for automated systems. This statistical framework opens up multiple avenues for application, functioning as a standalone assessment module or as an a posteriori enhancement to machine learning classifiers. The framework can be implemented as a black-box module that applies to existing classifiers, providing an adaptable layer of quality control and optimizing predictions by leveraging intuitive feature extraction strategies, emphasizing the rationale behind feature significance and the statistical rigor of feature selection. By integrating these methods with flexible machine learning applications, the proposed framework improves detection accuracy and reduces false positives and misclassifications, especially in complex, noisy environments.
