Table of Contents
Fetching ...

Interpretable Clustering: A Survey

Lianyu Hu, Mudi Jiang, Junjie Dong, Xinying Liu, Zengyou He

TL;DR

This survey addresses the problem of opaque clustering by formalizing interpretability within clustering and organizing methods across pre-, in-, and post-clustering stages. It introduces a four-criterion taxonomy (process stage, interpretable model, interpretability level, data modality) and links interpretable clustering to supervised XAI concepts, outlining intrinsic and post-hoc approaches. The paper comprehensively reviews pre-clustering feature extraction/selection, in-clustering models (decision trees, rules, prototypes, convex polyhedral), and post-clustering surrogates, emphasizing optimization-based formulations and trade-offs between interpretability and clustering quality. It highlights open challenges, such as scalability and evaluation of interpretability across diverse data types, and provides a repository for accessible methods to facilitate adoption in high-stakes domains.

Abstract

In recent years, much of the research on clustering algorithms has primarily focused on enhancing their accuracy and efficiency, frequently at the expense of interpretability. However, as these methods are increasingly being applied in high-stakes domains such as healthcare, finance, and autonomous systems, the need for transparent and interpretable clustering outcomes has become a critical concern. This is not only necessary for gaining user trust but also for satisfying the growing ethical and regulatory demands in these fields. Ensuring that decisions derived from clustering algorithms can be clearly understood and justified is now a fundamental requirement. To address this need, this paper provides a comprehensive and structured review of the current state of explainable clustering algorithms, identifying key criteria to distinguish between various methods. These insights can effectively assist researchers in making informed decisions about the most suitable explainable clustering methods for specific application contexts, while also promoting the development and adoption of clustering algorithms that are both efficient and transparent. For convenient access and reference, an open repository organizes representative and emerging interpretable clustering methods under the taxonomy proposed in this survey, available at https://github.com/hulianyu/Awesome-Interpretable-Clustering

Interpretable Clustering: A Survey

TL;DR

This survey addresses the problem of opaque clustering by formalizing interpretability within clustering and organizing methods across pre-, in-, and post-clustering stages. It introduces a four-criterion taxonomy (process stage, interpretable model, interpretability level, data modality) and links interpretable clustering to supervised XAI concepts, outlining intrinsic and post-hoc approaches. The paper comprehensively reviews pre-clustering feature extraction/selection, in-clustering models (decision trees, rules, prototypes, convex polyhedral), and post-clustering surrogates, emphasizing optimization-based formulations and trade-offs between interpretability and clustering quality. It highlights open challenges, such as scalability and evaluation of interpretability across diverse data types, and provides a repository for accessible methods to facilitate adoption in high-stakes domains.

Abstract

In recent years, much of the research on clustering algorithms has primarily focused on enhancing their accuracy and efficiency, frequently at the expense of interpretability. However, as these methods are increasingly being applied in high-stakes domains such as healthcare, finance, and autonomous systems, the need for transparent and interpretable clustering outcomes has become a critical concern. This is not only necessary for gaining user trust but also for satisfying the growing ethical and regulatory demands in these fields. Ensuring that decisions derived from clustering algorithms can be clearly understood and justified is now a fundamental requirement. To address this need, this paper provides a comprehensive and structured review of the current state of explainable clustering algorithms, identifying key criteria to distinguish between various methods. These insights can effectively assist researchers in making informed decisions about the most suitable explainable clustering methods for specific application contexts, while also promoting the development and adoption of clustering algorithms that are both efficient and transparent. For convenient access and reference, an open repository organizes representative and emerging interpretable clustering methods under the taxonomy proposed in this survey, available at https://github.com/hulianyu/Awesome-Interpretable-Clustering
Paper Structure (19 sections, 2 figures, 3 tables)

This paper contains 19 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Interpretable clustering taxonomy categorized by distinct criteria, most existing methods align with a single category per criterion.
  • Figure 2: Illustration of four interpretable clustering models applied to the same two-dimensional dataset with three Gaussian clusters. The upper panels display how each model partitions the feature space, while the bottom panels show the feature values used for interpretability.