A New Index for Clustering Evaluation Based on Density Estimation

Gangli Liu

A New Index for Clustering Evaluation Based on Density Estimation

Gangli Liu

TL;DR

This work tackles the challenge of internal clustering validation by introducing a density-estimation based index that blends two sub-indices: an Ambiguous Index $I_a$ and a Similarity Index $I_s$, combined as $I = \delta I_a + (1 - \delta) I_s$ with $\delta \in [0,1]$. Each sub-index relies on per-cluster kernel density estimates, defining cluster territories and likelihood-based similarity to capture both ambiguity and cohesion within clusters. The approach is evaluated on 145 datasets against six established internal indices, showing that the new index significantly improves ranking accuracy, especially after bandwidth optimization and algorithmic refinements. The results suggest practical gains for internal clustering validation, with potential extensions to higher dimensions and alternative density estimators.

Abstract

A new index for internal evaluation of clustering is introduced. The index is defined as a mixture of two sub-indices. The first sub-index $ I_a $ is called the Ambiguous Index; the second sub-index $ I_s $ is called the Similarity Index. Calculation of the two sub-indices is based on density estimation to each cluster of a partition of the data. An experiment is conducted to test the performance of the new index, and compared with six other internal clustering evaluation indices -- Calinski-Harabasz index, Silhouette coefficient, Davies-Bouldin index, CDbw, DBCV, and VIASCKDE, on a set of 145 datasets. The result shows the new index significantly improves other internal clustering evaluation indices.

A New Index for Clustering Evaluation Based on Density Estimation

TL;DR

This work tackles the challenge of internal clustering validation by introducing a density-estimation based index that blends two sub-indices: an Ambiguous Index

and a Similarity Index

, combined as

with

. Each sub-index relies on per-cluster kernel density estimates, defining cluster territories and likelihood-based similarity to capture both ambiguity and cohesion within clusters. The approach is evaluated on 145 datasets against six established internal indices, showing that the new index significantly improves ranking accuracy, especially after bandwidth optimization and algorithmic refinements. The results suggest practical gains for internal clustering validation, with potential extensions to higher dimensions and alternative density estimators.

Abstract

A new index for internal evaluation of clustering is introduced. The index is defined as a mixture of two sub-indices. The first sub-index

is called the Ambiguous Index; the second sub-index

is called the Similarity Index. Calculation of the two sub-indices is based on density estimation to each cluster of a partition of the data. An experiment is conducted to test the performance of the new index, and compared with six other internal clustering evaluation indices -- Calinski-Harabasz index, Silhouette coefficient, Davies-Bouldin index, CDbw, DBCV, and VIASCKDE, on a set of 145 datasets. The result shows the new index significantly improves other internal clustering evaluation indices.

Paper Structure (31 sections, 16 equations, 30 figures, 5 tables)

This paper contains 31 sections, 16 equations, 30 figures, 5 tables.

Introduction
RELATED WORK
Calinski-Harabasz index (CH)
Silhouette coefficient (SC)
Davies-Bouldin index (DB)
Other internal evaluation indices
Kernel density estimation
Calculation of the new index
The Ambiguous Index
The Similarity Index
Experiment
A set of datasets
Settings of the experiment
Results of the experiment
Counting accuracy of the indices
...and 16 more sections

Figures (30)

Figure 1: A dataset
Figure 2: A partition of the dataset
Figure 3: Ambiguous Points of the partition
Figure 4: Mixture of the two sub-indices works
Figure 5: Result of one dataset
...and 25 more figures

Theorems & Definitions (7)

Definition 3.1
Definition 3.2
Definition 3.3
Definition 3.4
Definition 3.5
Definition 6.1
Definition 6.2

A New Index for Clustering Evaluation Based on Density Estimation

TL;DR

Abstract

A New Index for Clustering Evaluation Based on Density Estimation

Authors

TL;DR

Abstract

Table of Contents

Figures (30)

Theorems & Definitions (7)