GBFRS: Robust Fuzzy Rough Sets via Granular-ball Computing
Shuyin Xia, Xiaoyu Lian, Binbin Sang, Guoyin Wang, Xinbo Gao
TL;DR
This work tackles robustness and scalability in fuzzy rough set-based feature selection for noisy, high-dimensional data by introducing Granular-ball Fuzzy Rough Set (GBFRS). GBFRS replaces point samples with multi-granularity granular-balls to define GB-based fuzzy similarities and lower/upper approximations, and it uses a weighted GB dependency to perform attribute reduction with monotonic convergence guarantees. Empirical results on UCI datasets demonstrate improved accuracy and resilience to label and attribute noise, highlighting the practical benefit of coarse-grained representations. The approach offers a scalable framework for robust feature selection and suggests extensions to other fuzzy rough set models.
Abstract
Fuzzy rough set theory is effective for processing datasets with complex attributes, supported by a solid mathematical foundation and closely linked to kernel methods in machine learning. Attribute reduction algorithms and classifiers based on fuzzy rough set theory exhibit promising performance in the analysis of high-dimensional multivariate complex data. However, most existing models operate at the finest granularity, rendering them inefficient and sensitive to noise, especially for high-dimensional big data. Thus, enhancing the robustness of fuzzy rough set models is crucial for effective feature selection. Muiti-garanularty granular-ball computing, a recent development, uses granular-balls of different sizes to adaptively represent and cover the sample space, performing learning based on these granular-balls. This paper proposes integrating multi-granularity granular-ball computing into fuzzy rough set theory, using granular-balls to replace sample points. The coarse-grained characteristics of granular-balls make the model more robust. Additionally, we propose a new method for generating granular-balls, scalable to the entire supervised method based on granular-ball computing. A forward search algorithm is used to select feature sequences by defining the correlation between features and categories through dependence functions. Experiments demonstrate the proposed model's effectiveness and superiority over baseline methods.
