On the Use of Relative Validity Indices for Comparing Clustering Approaches

Luke W. Yerbury; Ricardo J. G. B. Campello; G. C. Livingston; Mark Goldsworthy; Lachlan O'Neil

On the Use of Relative Validity Indices for Comparing Clustering Approaches

Luke W. Yerbury, Ricardo J. G. B. Campello, G. C. Livingston, Mark Goldsworthy, Lachlan O'Neil

TL;DR

This paper questions the use of Relative Validity Indices ($RVI$) for selecting a Similarity Paradigm ($SP$) in clustering, arguing that RVIs were designed for comparing partitions with the same SP or varying $k$, not for cross-SP evaluation. It conducts a large-scale empirical study across three dataset batteries, testing fixed-SP, matching-SP, and mean-SP evaluation schemes on seven RVIs and several clustering approaches, and relates RVI performance to external validity indices like ARI/AMI. The findings reveal systematic bias under fixed-SP schemes toward the SP used to generate partitions, limited reliability of RVIs for SP-selection (often far below k-selection performance), and no clear advantage for matching- or mean-SP schemes. The authors recommend discarding RVIs for SP-selection in favor of external validation on high-quality labeled data, visualization, and outcome-oriented criteria, while promoting standardized SP terminology and dataset-informed SP choice for robust clustering evaluations.

Abstract

Relative Validity Indices (RVIs) such as the Silhouette Width Criterion and Davies Bouldin indices are the most widely used tools for evaluating and optimising clustering outcomes. Traditionally, their ability to rank collections of candidate dataset partitions has been used to guide the selection of the number of clusters, and to compare partitions from different clustering algorithms. However, there is a growing trend in the literature to use RVIs when selecting a Similarity Paradigm (SP) for clustering - the combination of normalisation procedure, representation method, and distance measure which affects the computation of object dissimilarities used in clustering. Despite the growing prevalence of this practice, there has been no empirical or theoretical investigation into the suitability of RVIs for this purpose. Moreover, since RVIs are computed using object dissimilarities, it remains unclear how they would need to be implemented for fair comparisons of different SPs. This study presents the first comprehensive investigation into the reliability of RVIs for SP selection. We conducted extensive experiments with seven popular RVIs on over 2.7 million clustering partitions of synthetic and real-world datasets, encompassing feature-vector and time-series data. We identified fundamental conceptual limitations undermining the use of RVIs for SP selection, and our empirical findings confirmed this predicted unsuitability. Among our recommendations, we suggest instead that practitioners select SPs by using external validation on high quality labelled datasets or carefully designed outcome-oriented objective criteria, both of which should be informed by careful consideration of dataset characteristics, and domain requirements. Our findings have important implications for clustering methodology and evaluation, suggesting the need for more rigorous approaches to SP selection.

On the Use of Relative Validity Indices for Comparing Clustering Approaches

TL;DR

This paper questions the use of Relative Validity Indices (

) for selecting a Similarity Paradigm (

) in clustering, arguing that RVIs were designed for comparing partitions with the same SP or varying

, not for cross-SP evaluation. It conducts a large-scale empirical study across three dataset batteries, testing fixed-SP, matching-SP, and mean-SP evaluation schemes on seven RVIs and several clustering approaches, and relates RVI performance to external validity indices like ARI/AMI. The findings reveal systematic bias under fixed-SP schemes toward the SP used to generate partitions, limited reliability of RVIs for SP-selection (often far below k-selection performance), and no clear advantage for matching- or mean-SP schemes. The authors recommend discarding RVIs for SP-selection in favor of external validation on high-quality labeled data, visualization, and outcome-oriented criteria, while promoting standardized SP terminology and dataset-informed SP choice for robust clustering evaluations.

Abstract

Paper Structure (32 sections, 12 equations, 20 figures, 15 tables)

This paper contains 32 sections, 12 equations, 20 figures, 15 tables.

Introduction
Background and Literature Review
The Components of a Clustering Approach
Literature Review: Comparison of Similarity Paradigms with RVIs
Popular Software Implementations
Selected Relative Validity Indices
Recent Novel Relative Validity Indices
What are the issues?
Experimental Design
Methodology
Indices
Clustering Components
Datasets
The Vendramin Battery
The Gagolewski Battery
...and 17 more sections

Figures (20)

Figure 1: Diagram of a clustering approach with its five constituent components. A prototype definition is applied within some clustering algorithms, and is also required for producing cluster prototypes for some RVIs or various downstream purposes. Evaluation with an RVI requires the user to select whether the evaluation SP is independent of, or matches the SP used to produce the partition. These are referred to as fixed and matching evaluation schemes respectively.
Figure 2: Samples of time series from each of the four classes of the Trace dataset from the UCR archive Dau2019.
Figure 3: Pairwise distance matrices and corresponding MDS embeddings for the Trace dataset using various normalisations, raw representation and time warping edit distance.
Figure 4: Pairwise distance matrices and corresponding MDS embeddings for the Trace dataset using min-max normalisation, various representations and Euclidean distance.
Figure 5: Pairwise distance matrices and corresponding MDS embeddings for the Trace dataset using min-max normalisation, raw representation and various distances.
...and 15 more figures

On the Use of Relative Validity Indices for Comparing Clustering Approaches

TL;DR

Abstract

On the Use of Relative Validity Indices for Comparing Clustering Approaches

Authors

TL;DR

Abstract

Table of Contents

Figures (20)