Table of Contents
Fetching ...

Deep Taxonomic Networks for Unsupervised Hierarchical Prototype Discovery

Zekun Wang, Ethan Haarer, Tianyi Zhu, Zhiyi Dai, Christopher J. MacLellan

TL;DR

This work introduces Deep Taxonomic Networks (DTN), a deep latent-variable framework that learns an unlabeled, multilevel taxonomy by maximizing a complete binary-tree Mixture-of-Gaussians prior within a VAE. By formulating the ELBO with a hierarchical prior and a probabilistic, prototypicality-driven objective, DTN discovers interpretable prototypes at all levels, not just leaves, and supports flexible downstream classification without retraining. It integrates transformation-invariant learning via contrastive losses and demonstrates strong hierarchical clustering performance across MNIST, Fashion-MNIST, CIFAR-10/20/100, and Omniglot, while providing qualitative hierarchies that capture coarse-to-fine visual semantics. The results highlight the method’s ability to produce rich, human-interpretable taxonomies and suggest avenues for dynamic priors and broader generative capabilities, albeit with limitations tied to the fixed binary-tree structure and potential scalability considerations.

Abstract

Inspired by the human ability to learn and organize knowledge into hierarchical taxonomies with prototypes, this paper addresses key limitations in current deep hierarchical clustering methods. Existing methods often tie the structure to the number of classes and underutilize the rich prototype information available at intermediate hierarchical levels. We introduce deep taxonomic networks, a novel deep latent variable approach designed to bridge these gaps. Our method optimizes a large latent taxonomic hierarchy, specifically a complete binary tree structured mixture-of-Gaussian prior within a variational inference framework, to automatically discover taxonomic structures and associated prototype clusters directly from unlabeled data without assuming true label sizes. We analytically show that optimizing the ELBO of our method encourages the discovery of hierarchical relationships among prototypes. Empirically, our learned models demonstrate strong hierarchical clustering performance, outperforming baselines across diverse image classification datasets using our novel evaluation mechanism that leverages prototype clusters discovered at all hierarchical levels. Qualitative results further reveal that deep taxonomic networks discover rich and interpretable hierarchical taxonomies, capturing both coarse-grained semantic categories and fine-grained visual distinctions.

Deep Taxonomic Networks for Unsupervised Hierarchical Prototype Discovery

TL;DR

This work introduces Deep Taxonomic Networks (DTN), a deep latent-variable framework that learns an unlabeled, multilevel taxonomy by maximizing a complete binary-tree Mixture-of-Gaussians prior within a VAE. By formulating the ELBO with a hierarchical prior and a probabilistic, prototypicality-driven objective, DTN discovers interpretable prototypes at all levels, not just leaves, and supports flexible downstream classification without retraining. It integrates transformation-invariant learning via contrastive losses and demonstrates strong hierarchical clustering performance across MNIST, Fashion-MNIST, CIFAR-10/20/100, and Omniglot, while providing qualitative hierarchies that capture coarse-to-fine visual semantics. The results highlight the method’s ability to produce rich, human-interpretable taxonomies and suggest avenues for dynamic priors and broader generative capabilities, albeit with limitations tied to the fixed binary-tree structure and potential scalability considerations.

Abstract

Inspired by the human ability to learn and organize knowledge into hierarchical taxonomies with prototypes, this paper addresses key limitations in current deep hierarchical clustering methods. Existing methods often tie the structure to the number of classes and underutilize the rich prototype information available at intermediate hierarchical levels. We introduce deep taxonomic networks, a novel deep latent variable approach designed to bridge these gaps. Our method optimizes a large latent taxonomic hierarchy, specifically a complete binary tree structured mixture-of-Gaussian prior within a variational inference framework, to automatically discover taxonomic structures and associated prototype clusters directly from unlabeled data without assuming true label sizes. We analytically show that optimizing the ELBO of our method encourages the discovery of hierarchical relationships among prototypes. Empirically, our learned models demonstrate strong hierarchical clustering performance, outperforming baselines across diverse image classification datasets using our novel evaluation mechanism that leverages prototype clusters discovered at all hierarchical levels. Qualitative results further reveal that deep taxonomic networks discover rich and interpretable hierarchical taxonomies, capturing both coarse-grained semantic categories and fine-grained visual distinctions.

Paper Structure

This paper contains 52 sections, 32 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Examples of sub-hierarchies discovered by fitting a deep taxonomic network to CIFAR-10 data. For each cluster, we sampled nine images from the test set based on likelihood.
  • Figure 2: The graphic model for deep taxonomic networks. (a): solid arrows represent the generative sampling process. The grayed cluster $c_3$ is selected via the prior distribution $p(c)$. (b): dashed arrows represent the variational inference process. Red: learnable parameters.
  • Figure 3: Hierarchical clustering performance on all evaluated datasets at varying depth of $\mathcal{T}$. X-axis: depth, Y-axis: performance.
  • Figure 4: Prototypicality $p(c\mid\mathbf{z})$ on test data over $\mathcal{T}$. X-axis: Cluster indices of a flattened complete binary tree, ordered left-to-right starting with its $2^{10}$ leaf clusters. Y-axis: $p(c\mid\mathbf{z})$.
  • Figure 5: Examples of sub-hierarchy discovered by deep taxonomic networks on MNIST (\ref{['fig:mnist-89']}, \ref{['fig:nmist-4983']}), Fashion (\ref{['fig:fashion-low']}, \ref{['fig:fashion-high']}) and Omniglot (\ref{['fig:omniglot-1']}, \ref{['fig:omniglot-2']}). Images are sampled from the test set per cluster by likelihood.