Table of Contents
Fetching ...

Fuzzy Clustering to Identify Clusters at Different Levels of Fuzziness: An Evolutionary Multi-Objective Optimization Approach

Avisek Gupta, Shounak Datta, Swagatam Das

TL;DR

ECM is presented, a method of fuzzy clustering that simultaneously optimizes two contradictory objective functions, resulting in the creation of fuzzy clusters with different levels of fuzziness, which leads to better cluster detection compared to the conventional fuzzy clustering methods as well as previously used multiobjective methods.

Abstract

Fuzzy clustering methods identify naturally occurring clusters in a dataset, where the extent to which different clusters are overlapped can differ. Most methods have a parameter to fix the level of fuzziness. However, the appropriate level of fuzziness depends on the application at hand. This paper presents Entropy $c$-Means (ECM), a method of fuzzy clustering that simultaneously optimizes two contradictory objective functions, resulting in the creation of fuzzy clusters with different levels of fuzziness. This allows ECM to identify clusters with different degrees of overlap. ECM optimizes the two objective functions using two multi-objective optimization methods, Non-dominated Sorting Genetic Algorithm II (NSGA-II), and Multiobjective Evolutionary Algorithm based on Decomposition (MOEA/D). We also propose a method to select a suitable trade-off clustering from the Pareto front. Experiments on challenging synthetic datasets as well as real-world datasets show that ECM leads to better cluster detection compared to the conventional fuzzy clustering methods as well as previously used multi-objective methods for fuzzy clustering.

Fuzzy Clustering to Identify Clusters at Different Levels of Fuzziness: An Evolutionary Multi-Objective Optimization Approach

TL;DR

ECM is presented, a method of fuzzy clustering that simultaneously optimizes two contradictory objective functions, resulting in the creation of fuzzy clusters with different levels of fuzziness, which leads to better cluster detection compared to the conventional fuzzy clustering methods as well as previously used multiobjective methods.

Abstract

Fuzzy clustering methods identify naturally occurring clusters in a dataset, where the extent to which different clusters are overlapped can differ. Most methods have a parameter to fix the level of fuzziness. However, the appropriate level of fuzziness depends on the application at hand. This paper presents Entropy -Means (ECM), a method of fuzzy clustering that simultaneously optimizes two contradictory objective functions, resulting in the creation of fuzzy clusters with different levels of fuzziness. This allows ECM to identify clusters with different degrees of overlap. ECM optimizes the two objective functions using two multi-objective optimization methods, Non-dominated Sorting Genetic Algorithm II (NSGA-II), and Multiobjective Evolutionary Algorithm based on Decomposition (MOEA/D). We also propose a method to select a suitable trade-off clustering from the Pareto front. Experiments on challenging synthetic datasets as well as real-world datasets show that ECM leads to better cluster detection compared to the conventional fuzzy clustering methods as well as previously used multi-objective methods for fuzzy clustering.

Paper Structure

This paper contains 15 sections, 6 equations, 19 figures, 9 tables.

Figures (19)

  • Figure 1: Clustering a synthetic dataset (a) using ECM-NSGA-II. The dataset contains two overlapped clusters A and B, and a third well-separated cluster C. At the top left corner of the Pareto front in (b), $f_1$ is minimized creating compact clusters with low overlap (c). At the bottom right corner, $f_2$ is maximized by minimizing $-f_2$, leading to more overlapped clusters as shown in (f). Across the Pareto front clusters formed have different levels of fuzziness, see (d) and (e).
  • Figure 2: Clustering of a dataset with $3$ clusters for different levels of fuzziness $m = 2, 30, 50,$ and $100$. The points in different clusters are drawn in red, blue and green and the points in the regions of overlap between the clusters are drawn in yellow. For higher values of $m$, a large number of points in the regions of overlap do not contribute to the location of cluster centers.
  • Figure 3: The synthetic proximity datasets.
  • Figure 4: The synthetic spread datasets.
  • Figure 5: Comparison of the maximum ARI achieved on the proximity datasets
  • ...and 14 more figures