Table of Contents
Fetching ...

From Logits to Hierarchies: Hierarchical Clustering made Simple

Emanuele Palumbo, Moritz Vandenhirtz, Alain Ryser, Imant Daunhawer, Julia E. Vogt

TL;DR

The paper addresses the challenge of scalable, high-quality hierarchical clustering by showing limitations of recent deep hierarchical models and proposing a logits-based, no-fine-tuning method (L2H) that builds hierarchies atop pre-trained flat clustering models. L2H constructs a tree by iteratively merging clusters based on group scores derived from predicted probabilities, requiring only logits and working with black-box models. Empirically, L2H outperforms specialized hierarchical methods on CIFAR-10/100 and Food-101 and remains leaf-accurate while delivering strong hierarchical quality, with CPU-friendly performance on ImageNet-scale data. The approach also applies to supervised settings, demonstrated by recovering WordNet-like hierarchies from a pre-trained ImageNet classifier and revealing biases. Overall, L2H offers a general, efficient alternative for practical hierarchical clustering across large datasets and model types.

Abstract

The hierarchical structure inherent in many real-world datasets makes the modeling of such hierarchies a crucial objective in both unsupervised and supervised machine learning. While recent advancements have introduced deep architectures specifically designed for hierarchical clustering, we adopt a critical perspective on this line of research. Our findings reveal that these methods face significant limitations in scalability and performance when applied to realistic datasets. Given these findings, we present an alternative approach and introduce a lightweight method that builds on pre-trained non-hierarchical clustering models. Remarkably, our approach outperforms specialized deep models for hierarchical clustering, and it is broadly applicable to any pre-trained clustering model that outputs logits, without requiring any fine-tuning. To highlight the generality of our approach, we extend its application to a supervised setting, demonstrating its ability to recover meaningful hierarchies from a pre-trained ImageNet classifier. Our results establish a practical and effective alternative to existing deep hierarchical clustering methods, with significant advantages in efficiency, scalability and performance.

From Logits to Hierarchies: Hierarchical Clustering made Simple

TL;DR

The paper addresses the challenge of scalable, high-quality hierarchical clustering by showing limitations of recent deep hierarchical models and proposing a logits-based, no-fine-tuning method (L2H) that builds hierarchies atop pre-trained flat clustering models. L2H constructs a tree by iteratively merging clusters based on group scores derived from predicted probabilities, requiring only logits and working with black-box models. Empirically, L2H outperforms specialized hierarchical methods on CIFAR-10/100 and Food-101 and remains leaf-accurate while delivering strong hierarchical quality, with CPU-friendly performance on ImageNet-scale data. The approach also applies to supervised settings, demonstrated by recovering WordNet-like hierarchies from a pre-trained ImageNet classifier and revealing biases. Overall, L2H offers a general, efficient alternative for practical hierarchical clustering across large datasets and model types.

Abstract

The hierarchical structure inherent in many real-world datasets makes the modeling of such hierarchies a crucial objective in both unsupervised and supervised machine learning. While recent advancements have introduced deep architectures specifically designed for hierarchical clustering, we adopt a critical perspective on this line of research. Our findings reveal that these methods face significant limitations in scalability and performance when applied to realistic datasets. Given these findings, we present an alternative approach and introduce a lightweight method that builds on pre-trained non-hierarchical clustering models. Remarkably, our approach outperforms specialized deep models for hierarchical clustering, and it is broadly applicable to any pre-trained clustering model that outputs logits, without requiring any fine-tuning. To highlight the generality of our approach, we extend its application to a supervised setting, demonstrating its ability to recover meaningful hierarchies from a pre-trained ImageNet classifier. Our results establish a practical and effective alternative to existing deep hierarchical clustering methods, with significant advantages in efficiency, scalability and performance.

Paper Structure

This paper contains 13 sections, 14 equations, 9 figures, 9 tables, 1 algorithm.

Figures (9)

  • Figure 1: Illustration of the L2H algorithm. The four depicted clusters represent dogs in blue, cats in yellow, horses in red, birds in green respectively. In the first iteration (bottom), where groups correspond to single clusters, the dog cluster is selected for merging (shaded in grey). When recomputing predicted probabilities for samples in the dogs cluster, restricting to the remaining clusters, the cluster of cats has the highest predicted probability of reassignment. Note how, after merging, these two clusters are considered as a single group in the next iteration (top).
  • Figure 2: Visualization of the hierarchical clustering produced by L2H-TURTLE on the CIFAR-100 dataset. The inferred hierarchy is represented as a circular tree. On the lowest level, the leaves are annotated by reporting the most frequent label for the samples in each leaf. Leaves are color-coded according to the 20 superclasses in the dataset.
  • Figure 3: Visualization of inferred hierarchy for the ImageNet-1K dataset. The hierarchy is represented as a circular tree, where the leaf nodes are organized in a circle. \ref{['subfig:circular-dendrogram-imagenet1k:global']} shows the complete tree colored by the corresponding WordNet hypernyms "artifact" and "organism", which are the largest two superclasses in the ImageNet dataset. \ref{['subfig:circular-dendrogram-imagenet1k:local']} shows the subtree of birds colored by different bird species if they comprise more than one class. The results show that our method recovers a significant portion of the global and local hierarchical structure of the ImageNet dataset.
  • Figure 4: Python code implementation for the L2H algorithm presented in \ref{['sec:method']}. Note that we choose the aggregation function when computing the score per group as described in \ref{['app:implementation']}.
  • Figure 5: Sensitivity analysis for L2H-TURTLE on the CIFAR-100 dataset with respect to the $K$ hyperparameter, which corresponds to the number of leaves in the hierarchy and the number of clusters set for training the pre-trained flat model (TURTLE in this case). Note that the true number of clusters is equal to $100$. Results for both flat (NMI, ARI, ACC, LP) and hierarchical (DP, LHD) metrics are included, with standard deviations across five independent runs reported as shaded areas around the line indicating mean values. We also include the log-normalized TURTLE model loss---reported in the rightmost plot in the bottom row---that proves to be indicative for model selection with respect to the $K$ hyperparameter.
  • ...and 4 more figures