From Logits to Hierarchies: Hierarchical Clustering made Simple
Emanuele Palumbo, Moritz Vandenhirtz, Alain Ryser, Imant Daunhawer, Julia E. Vogt
TL;DR
The paper addresses the challenge of scalable, high-quality hierarchical clustering by showing limitations of recent deep hierarchical models and proposing a logits-based, no-fine-tuning method (L2H) that builds hierarchies atop pre-trained flat clustering models. L2H constructs a tree by iteratively merging clusters based on group scores derived from predicted probabilities, requiring only logits and working with black-box models. Empirically, L2H outperforms specialized hierarchical methods on CIFAR-10/100 and Food-101 and remains leaf-accurate while delivering strong hierarchical quality, with CPU-friendly performance on ImageNet-scale data. The approach also applies to supervised settings, demonstrated by recovering WordNet-like hierarchies from a pre-trained ImageNet classifier and revealing biases. Overall, L2H offers a general, efficient alternative for practical hierarchical clustering across large datasets and model types.
Abstract
The hierarchical structure inherent in many real-world datasets makes the modeling of such hierarchies a crucial objective in both unsupervised and supervised machine learning. While recent advancements have introduced deep architectures specifically designed for hierarchical clustering, we adopt a critical perspective on this line of research. Our findings reveal that these methods face significant limitations in scalability and performance when applied to realistic datasets. Given these findings, we present an alternative approach and introduce a lightweight method that builds on pre-trained non-hierarchical clustering models. Remarkably, our approach outperforms specialized deep models for hierarchical clustering, and it is broadly applicable to any pre-trained clustering model that outputs logits, without requiring any fine-tuning. To highlight the generality of our approach, we extend its application to a supervised setting, demonstrating its ability to recover meaningful hierarchies from a pre-trained ImageNet classifier. Our results establish a practical and effective alternative to existing deep hierarchical clustering methods, with significant advantages in efficiency, scalability and performance.
