Table of Contents
Fetching ...

Variable feature weighted fuzzy k-means algorithm for high dimensional data

Vikas Singh, Nishchal K. Verma

TL;DR

The paper tackles clustering high-dimensional data where features have differing relevance by introducing cluster-dependent feature weights within a fuzzy k-means framework. It defines an entropy-based objective $P(U,V,W)$ that combines within-cluster dispersion with data-point membership entropy and feature weight entropy to learn weights per cluster. An alternating optimization procedure yields closed-form updates for $V$, $U$, and $W$, with adaptive parameters $lambda_i$ and $gamma_j$, demonstrating improved AR, RI, and NMI on real and synthetic datasets against six baselines, while providing data availability. The work suggests practical impact for robust subspace clustering and proposes future extensions to handle feature correlations and mixed data types.

Abstract

This paper presents a new fuzzy k-means algorithm for the clustering of high-dimensional data in various subspaces. Since high-dimensional data, some features might be irrelevant and relevant but may have different significance in the clustering process. For better clustering, it is crucial to incorporate the contribution of these features in the clustering process. To combine these features, in this paper, we have proposed a novel fuzzy k-means clustering algorithm by modifying the objective function of the fuzzy k-means using two different entropy terms. The first entropy term helps to minimize the within-cluster dispersion and maximize the negative entropy to determine clusters to contribute to the association of data points. The second entropy term helps control the weight of the features because different features have different contributing weights during the clustering to obtain a better partition. The proposed approach performance is presented in various clustering measures (AR, RI and NMI) on multiple datasets and compared with six other state-of-the-art methods. Impact Statement- In real-world applications, cluster-dependent feature weights help in partitioning the data set into more meaningful clusters. These features may be relevant, irrelevant, or redundant, but they each have different contributions during the clustering process. In this paper, a cluster-dependent feature weights approach is presented using fuzzy k-means to assign higher weights to relevant features and lower weights to irrelevant features during clustering. The method is validated using both supervised and unsupervised performance measures on real-world and synthetic datasets to demonstrate its effectiveness compared to state-of-the-art methods.

Variable feature weighted fuzzy k-means algorithm for high dimensional data

TL;DR

The paper tackles clustering high-dimensional data where features have differing relevance by introducing cluster-dependent feature weights within a fuzzy k-means framework. It defines an entropy-based objective that combines within-cluster dispersion with data-point membership entropy and feature weight entropy to learn weights per cluster. An alternating optimization procedure yields closed-form updates for , , and , with adaptive parameters and , demonstrating improved AR, RI, and NMI on real and synthetic datasets against six baselines, while providing data availability. The work suggests practical impact for robust subspace clustering and proposes future extensions to handle feature correlations and mixed data types.

Abstract

This paper presents a new fuzzy k-means algorithm for the clustering of high-dimensional data in various subspaces. Since high-dimensional data, some features might be irrelevant and relevant but may have different significance in the clustering process. For better clustering, it is crucial to incorporate the contribution of these features in the clustering process. To combine these features, in this paper, we have proposed a novel fuzzy k-means clustering algorithm by modifying the objective function of the fuzzy k-means using two different entropy terms. The first entropy term helps to minimize the within-cluster dispersion and maximize the negative entropy to determine clusters to contribute to the association of data points. The second entropy term helps control the weight of the features because different features have different contributing weights during the clustering to obtain a better partition. The proposed approach performance is presented in various clustering measures (AR, RI and NMI) on multiple datasets and compared with six other state-of-the-art methods. Impact Statement- In real-world applications, cluster-dependent feature weights help in partitioning the data set into more meaningful clusters. These features may be relevant, irrelevant, or redundant, but they each have different contributions during the clustering process. In this paper, a cluster-dependent feature weights approach is presented using fuzzy k-means to assign higher weights to relevant features and lower weights to irrelevant features during clustering. The method is validated using both supervised and unsupervised performance measures on real-world and synthetic datasets to demonstrate its effectiveness compared to state-of-the-art methods.

Paper Structure

This paper contains 11 sections, 20 equations, 7 figures, 8 tables, 1 algorithm.

Figures (7)

  • Figure 1: Feature weights for the IRIS dataset in all three clusters.
  • Figure 2: PCs for the Lung dataset in varying the number of clusters.
  • Figure 3: The values of $\lambda$ for the IRIS dataset
  • Figure 4: Cluster centers are randomly initialized using max-min.
  • Figure 5: Cluster centers after the convergence of proposed approach.
  • ...and 2 more figures