How to Achieve the Intended Aim of Deep Clustering Now, without Deep Learning

Kai Ming Ting; Wei-Jie Xu; Hang Zhang

How to Achieve the Intended Aim of Deep Clustering Now, without Deep Learning

Kai Ming Ting, Wei-Jie Xu, Hang Zhang

TL;DR

This paper interrogates the claim that deep clustering (DC) overcomes the fundamental limits of $k$-means and argues that current DC methods (e.g., DEC/IDEC) share $k$-means–like constraints due to relying on pointwise similarity. It introduces Cluster-as-Distribution (CaD) Clustering, which treats each cluster as a distribution and uses a distributional kernel to cluster without learning representations, achieving the intended DC aim in a simple, linear-time greedy framework. Empirical results across high-dimensional image and biological data show CaD methods (notably KBC) often outperform DC baselines and match or exceed performance on challenging datasets, suggesting distributional information is the key missing ingredient. The work advocates replacing the traditional clustering definition with a distribution-based formulation to better capture cluster structure and guide future clustering research, potentially reducing reliance on deep learning for certain tasks. This redefinition and CaD approach offer a practical and scalable path to robust clustering in high-dimensional domains.

Abstract

Deep clustering (DC) is often quoted to have a key advantage over $k$-means clustering. Yet, this advantage is often demonstrated using image datasets only, and it is unclear whether it addresses the fundamental limitations of $k$-means clustering. Deep Embedded Clustering (DEC) learns a latent representation via an autoencoder and performs clustering based on a $k$-means-like procedure, while the optimization is conducted in an end-to-end manner. This paper investigates whether the deep-learned representation has enabled DEC to overcome the known fundamental limitations of $k$-means clustering, i.e., its inability to discover clusters of arbitrary shapes, varied sizes and densities. Our investigations on DEC have a wider implication on deep clustering methods in general. Notably, none of these methods exploit the underlying data distribution. We uncover that a non-deep learning approach achieves the intended aim of deep clustering by making use of distributional information of clusters in a dataset to effectively address these fundamental limitations.

How to Achieve the Intended Aim of Deep Clustering Now, without Deep Learning

TL;DR

This paper interrogates the claim that deep clustering (DC) overcomes the fundamental limits of

-means and argues that current DC methods (e.g., DEC/IDEC) share

-means–like constraints due to relying on pointwise similarity. It introduces Cluster-as-Distribution (CaD) Clustering, which treats each cluster as a distribution and uses a distributional kernel to cluster without learning representations, achieving the intended DC aim in a simple, linear-time greedy framework. Empirical results across high-dimensional image and biological data show CaD methods (notably KBC) often outperform DC baselines and match or exceed performance on challenging datasets, suggesting distributional information is the key missing ingredient. The work advocates replacing the traditional clustering definition with a distribution-based formulation to better capture cluster structure and guide future clustering research, potentially reducing reliance on deep learning for certain tasks. This redefinition and CaD approach offer a practical and scalable path to robust clustering in high-dimensional domains.

Abstract

Deep clustering (DC) is often quoted to have a key advantage over

-means clustering. Yet, this advantage is often demonstrated using image datasets only, and it is unclear whether it addresses the fundamental limitations of

-means clustering. Deep Embedded Clustering (DEC) learns a latent representation via an autoencoder and performs clustering based on a

-means-like procedure, while the optimization is conducted in an end-to-end manner. This paper investigates whether the deep-learned representation has enabled DEC to overcome the known fundamental limitations of

-means clustering, i.e., its inability to discover clusters of arbitrary shapes, varied sizes and densities. Our investigations on DEC have a wider implication on deep clustering methods in general. Notably, none of these methods exploit the underlying data distribution. We uncover that a non-deep learning approach achieves the intended aim of deep clustering by making use of distributional information of clusters in a dataset to effectively address these fundamental limitations.

Paper Structure (25 sections, 9 equations, 2 figures, 10 tables)

This paper contains 25 sections, 9 equations, 2 figures, 10 tables.

Introduction
Preliminaries
Limitations of Current Definitions
Current Definitions of Clustering
Fundamental Limitations of DEC and IDEC
Limitations of Representation-Based Clustering
Cluster as Distribution: A New Definition
Rethinking Clustering Paradigms
The differences between DC and CaD Clustering
High Dimensional Datasets
Recommendations
Conclusions
Impact Statements
Existing Definitions of Clustering
KBC Algorithm
...and 10 more sections

Figures (2)

Figure 1: An illustration of the evolution from Definition \ref{['def-typical']} to Definition \ref{['def-distribution-clustering']}.
Figure 2: An illustration of Deep Clustering (DC) and CaD Clustering via different means (deep learning versus simple greedy search with distributional kernel, respectively) to achieve the same intended aims of clustering via mapping from clusters in the input space to centroids in the mapped space.

Theorems & Definitions (7)

Definition 1
Definition 2
Definition 3
Definition 4
Definition 5
Definition 6
Definition 7

How to Achieve the Intended Aim of Deep Clustering Now, without Deep Learning

TL;DR

Abstract

How to Achieve the Intended Aim of Deep Clustering Now, without Deep Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (2)

Theorems & Definitions (7)