Table of Contents
Fetching ...

NK Hybrid Genetic Algorithm for Clustering

Renato Tinós, Liang Zhao, Francisco Chicano, Darrell Whitley

TL;DR

The paper tackles clustering with unknown numbers of clusters and arbitrary shapes by introducing NKCV2, an internal validation criterion that uses local density through small object groups and an interaction graph. It couples NKCV2 with a gray-box guided NK hybrid GA, featuring mutation operators and a local search informed by the variable interactions and a partition crossover that decomposes the objective for efficient recombination. Empirical results show NKCV2 effectively detects density-based regions and that the NK hybrid GA outperforms a prior GA approach and competes with state-of-the-art clustering methods across Gaussian, shape, and UCI datasets, with some exceptions where spherical clustering dominates. The approach offers automatic cluster number estimation and robust performance, highlighting the practical impact of integrating problem structure into evolutionary clustering strategies.

Abstract

The NK hybrid genetic algorithm for clustering is proposed in this paper. In order to evaluate the solutions, the hybrid algorithm uses the NK clustering validation criterion 2 (NKCV2). NKCV2 uses information about the disposition of $N$ small groups of objects. Each group is composed of $K+1$ objects of the dataset. Experimental results show that density-based regions can be identified by using NKCV2 with fixed small $K$. In NKCV2, the relationship between decision variables is known, which in turn allows us to apply gray box optimization. Mutation operators, a partition crossover, and a local search strategy are proposed, all using information about the relationship between decision variables. In partition crossover, the evaluation function is decomposed into $q$ independent components; partition crossover then deterministically returns the best among $2^q$ possible offspring with computational complexity $O(N)$. The NK hybrid genetic algorithm allows the detection of clusters with arbitrary shapes and the automatic estimation of the number of clusters. In the experiments, the NK hybrid genetic algorithm produced very good results when compared to another genetic algorithm approach and to state-of-art clustering algorithms.

NK Hybrid Genetic Algorithm for Clustering

TL;DR

The paper tackles clustering with unknown numbers of clusters and arbitrary shapes by introducing NKCV2, an internal validation criterion that uses local density through small object groups and an interaction graph. It couples NKCV2 with a gray-box guided NK hybrid GA, featuring mutation operators and a local search informed by the variable interactions and a partition crossover that decomposes the objective for efficient recombination. Empirical results show NKCV2 effectively detects density-based regions and that the NK hybrid GA outperforms a prior GA approach and competes with state-of-the-art clustering methods across Gaussian, shape, and UCI datasets, with some exceptions where spherical clustering dominates. The approach offers automatic cluster number estimation and robust performance, highlighting the practical impact of integrating problem structure into evolutionary clustering strategies.

Abstract

The NK hybrid genetic algorithm for clustering is proposed in this paper. In order to evaluate the solutions, the hybrid algorithm uses the NK clustering validation criterion 2 (NKCV2). NKCV2 uses information about the disposition of small groups of objects. Each group is composed of objects of the dataset. Experimental results show that density-based regions can be identified by using NKCV2 with fixed small . In NKCV2, the relationship between decision variables is known, which in turn allows us to apply gray box optimization. Mutation operators, a partition crossover, and a local search strategy are proposed, all using information about the relationship between decision variables. In partition crossover, the evaluation function is decomposed into independent components; partition crossover then deterministically returns the best among possible offspring with computational complexity . The NK hybrid genetic algorithm allows the detection of clusters with arbitrary shapes and the automatic estimation of the number of clusters. In the experiments, the NK hybrid genetic algorithm produced very good results when compared to another genetic algorithm approach and to state-of-art clustering algorithms.
Paper Structure (19 sections, 20 equations, 2 figures, 8 tables, 6 algorithms)

This paper contains 19 sections, 20 equations, 2 figures, 8 tables, 6 algorithms.

Figures (2)

  • Figure 1: Building the interaction graph. a) Dataset with 7 objects ($N=7$). b) Each object of the dataset is associated with a vertex with self-loop. For each vertex, an incident edge from the vertex representing the nearest object with higher density is created (red edges). By definition, the incident edge for the vertex with highest density is from the vertex representing the nearest object. c) The next step is to add edges between near objects until each vertex has indegree equal to $K+1$ (in this example, $K=2$). The interaction graph has $N=7$ vertices and $N(K+1)$ edges. d) Each subfunction $f_i$ is influenced by a group of $K+1$ decision variables. The group of labels influencing subfunction $f_i$ is defined by the $K+1$ incident edges to $v_i$.
  • Figure 2: Subfunctions $f_i$.