Scaling Up Deep Clustering Methods Beyond ImageNet-1K
Nikolas Adaloglou, Felix Michels, Kaspar Senft, Diana Petrusheva, Markus Kollmann
TL;DR
This work systematically expands clustering evaluation to large-scale, realistic data by constructing ImageNet21K-based benchmarks that isolate class imbalance, granularity, easy-to-classify subsets, and multi-label signals. It shows that feature-based clustering methods TEMI and SCANv2 generally outperform $k$-means on these large-scale benchmarks, though the gains narrow as the dataset grows and becomes imbalanced or Coarser-grained. The study also demonstrates that easy-to-classify and multi-label scenarios reveal substantial gaps in $k$-means performance and that non-primary cluster predictions can reflect meaningful, higher-level semantics. Collectively, the benchmarks and findings advocate for broader large-scale evaluation beyond ImageNet-1K to better assess clustering methods in real-world, hierarchical, and multi-label contexts.
Abstract
Deep image clustering methods are typically evaluated on small-scale balanced classification datasets while feature-based $k$-means has been applied on proprietary billion-scale datasets. In this work, we explore the performance of feature-based deep clustering approaches on large-scale benchmarks whilst disentangling the impact of the following data-related factors: i) class imbalance, ii) class granularity, iii) easy-to-recognize classes, and iv) the ability to capture multiple classes. Consequently, we develop multiple new benchmarks based on ImageNet21K. Our experimental analysis reveals that feature-based $k$-means is often unfairly evaluated on balanced datasets. However, deep clustering methods outperform $k$-means across most large-scale benchmarks. Interestingly, $k$-means underperforms on easy-to-classify benchmarks by large margins. The performance gap, however, diminishes on the highest data regimes such as ImageNet21K. Finally, we find that non-primary cluster predictions capture meaningful classes (i.e. coarser classes).
