Benchmarking of Clustering Validity Measures Revisited

Connor Simpson; Ricardo J. G. B. Campello; Elizabeth Stojanovski

Benchmarking of Clustering Validity Measures Revisited

Connor Simpson, Ricardo J. G. B. Campello, Elizabeth Stojanovski

TL;DR

This paper addresses the lack of a universally reliable internal clustering validity index by performing a large-scale benchmark of 26 internal indexes across 16177 synthetic datasets generated with eight clustering algorithms. It advances methodology by introducing three complementary evaluation schemes and using rank-based correlations with aggregated external rankings to mitigate non-linear biases. Key findings show that no single index dominates across all problems; performance strongly depends on the clustering algorithm and data properties, with non-linear relationships between internal and external assessments common. Practically, the work provides guidance on selecting index ensembles tailored to the specific clustering setup and highlights the importance of dataset representativeness in benchmarking internal validity measures.

Abstract

Validation plays a crucial role in the clustering process. Many different internal validity indexes exist for the purpose of determining the best clustering solution(s) from a given collection of candidates, e.g., as produced by different algorithms or different algorithm hyper-parameters. In this study, we present a comprehensive benchmark study of 26 internal validity indexes, which includes highly popular classic indexes as well as more recently developed ones. We adopted an enhanced revision of the methodology presented in Vendramin et al. (2010), developed here to address several shortcomings of this previous work. This overall new approach consists of three complementary custom-tailored evaluation sub-methodologies, each of which has been designed to assess specific aspects of an index's behaviour while preventing potential biases of the other sub-methodologies. Each sub-methodology features two complementary measures of performance, alongside mechanisms that allow for an in-depth investigation of more complex behaviours of the internal validity indexes under study. Additionally, a new collection of 16177 datasets has been produced, paired with eight widely-used clustering algorithms, for a wider applicability scope and representation of more diverse clustering scenarios.

Benchmarking of Clustering Validity Measures Revisited

TL;DR

Abstract

Benchmarking of Clustering Validity Measures Revisited

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (17)