Cascaded two-stage feature clustering and selection via separability and consistency in fuzzy decision systems
Yuepeng Chen, Weiping Ding, Hengrong Ju, Jiashuang Huang, Tao Yin
TL;DR
The paper introduces FCSSC, a cascaded two-stage feature clustering and selection framework for fuzzy decision systems that integrates a clustering-based forward search with a novel fusion metric combining global separability and local consistency. By first clustering features with fuzzy C-Means and then selecting features via SIG scores within clusters, FCSSC achieves superior classification accuracy while using fewer features across 18 public datasets and a schizophrenia rs-fMRI dataset. Key ideas include defining GS_B(D) through intra-class cohesion and inter-class separation and LC_B(D) via adaptive fuzzy neighborhood relations, balanced by a parameter β, and selecting features to maximize the discriminative gamma score. The work demonstrates significant performance gains over six benchmarks and highlights potential for biomedical applications, with future directions including kernel-based expansions and multi-view data handling.
Abstract
Feature selection is a vital technique in machine learning, as it can reduce computational complexity, improve model performance, and mitigate the risk of overfitting. However, the increasing complexity and dimensionality of datasets pose significant challenges in the selection of features. Focusing on these challenges, this paper proposes a cascaded two-stage feature clustering and selection algorithm for fuzzy decision systems. In the first stage, we reduce the search space by clustering relevant features and addressing inter-feature redundancy. In the second stage, a clustering-based sequentially forward selection method that explores the global and local structure of data is presented. We propose a novel metric for assessing the significance of features, which considers both global separability and local consistency. Global separability measures the degree of intra-class cohesion and inter-class separation based on fuzzy membership, providing a comprehensive understanding of data separability. Meanwhile, local consistency leverages the fuzzy neighborhood rough set model to capture uncertainty and fuzziness in the data. The effectiveness of our proposed algorithm is evaluated through experiments conducted on 18 public datasets and a real-world schizophrenia dataset. The experiment results demonstrate our algorithm's superiority over benchmarking algorithms in both classification accuracy and the number of selected features.
