Table of Contents
Fetching ...

Hypernym Bias: Unraveling Deep Classifier Training Dynamics through the Lens of Class Hierarchy

Roman Malashin, Valeria Yachnaya, Alexander Mullin

TL;DR

This work investigates training dynamics of deep classifiers through the lens of class hierarchies, introducing the concept of hypernym bias and a framework to track the evolving feature manifold. It combines a greedy hypernym classification setup, manifold-evolution metrics with WordNet, and an extended neural collapse analysis to compare hypernym and hyponym spaces. Empirical results on ImageNet reveal that hypernym distinctions are learned earlier than hyponyms, with neural-collapse-like patterns emerging in hypernym spaces ahead of the hyponym space, and these dynamics generalize to CIFAR-100 and DBPedia. The study offers mechanistic insights into hierarchical learning in deep networks and motivates data-driven hierarchies and hierarchical training strategies for improved understanding and efficiency.

Abstract

We investigate the training dynamics of deep classifiers by examining how hierarchical relationships between classes evolve during training. Through extensive experiments, we argue that the learning process in classification problems can be understood through the lens of label clustering. Specifically, we observe that networks tend to distinguish higher-level (hypernym) categories in the early stages of training, and learn more specific (hyponym) categories later. We introduce a novel framework to track the evolution of the feature manifold during training, revealing how the hierarchy of class relations emerges and refines across the network layers. Our analysis demonstrates that the learned representations closely align with the semantic structure of the dataset, providing a quantitative description of the clustering process. Notably, we show that in the hypernym label space, certain properties of neural collapse appear earlier than in the hyponym label space, helping to bridge the gap between the initial and terminal phases of learning. We believe our findings offer new insights into the mechanisms driving hierarchical learning in deep networks, paving the way for future advancements in understanding deep learning dynamics.

Hypernym Bias: Unraveling Deep Classifier Training Dynamics through the Lens of Class Hierarchy

TL;DR

This work investigates training dynamics of deep classifiers through the lens of class hierarchies, introducing the concept of hypernym bias and a framework to track the evolving feature manifold. It combines a greedy hypernym classification setup, manifold-evolution metrics with WordNet, and an extended neural collapse analysis to compare hypernym and hyponym spaces. Empirical results on ImageNet reveal that hypernym distinctions are learned earlier than hyponyms, with neural-collapse-like patterns emerging in hypernym spaces ahead of the hyponym space, and these dynamics generalize to CIFAR-100 and DBPedia. The study offers mechanistic insights into hierarchical learning in deep networks and motivates data-driven hierarchies and hierarchical training strategies for improved understanding and efficiency.

Abstract

We investigate the training dynamics of deep classifiers by examining how hierarchical relationships between classes evolve during training. Through extensive experiments, we argue that the learning process in classification problems can be understood through the lens of label clustering. Specifically, we observe that networks tend to distinguish higher-level (hypernym) categories in the early stages of training, and learn more specific (hyponym) categories later. We introduce a novel framework to track the evolution of the feature manifold during training, revealing how the hierarchy of class relations emerges and refines across the network layers. Our analysis demonstrates that the learned representations closely align with the semantic structure of the dataset, providing a quantitative description of the clustering process. Notably, we show that in the hypernym label space, certain properties of neural collapse appear earlier than in the hyponym label space, helping to bridge the gap between the initial and terminal phases of learning. We believe our findings offer new insights into the mechanisms driving hierarchical learning in deep networks, paving the way for future advancements in understanding deep learning dynamics.

Paper Structure

This paper contains 34 sections, 33 equations, 18 figures, 7 tables.

Figures (18)

  • Figure 1: Hypernym bias. (a) Relative accuracy during training for ResNet-18, ResNet-50, ViT-B/16, and MobileNet-V3. Hypernyms reach near maximum recognition accuracy on early epochs. (b) UMAP embeddings of ResNet-152 penultimate layer features in neural collapse settings. Color shows hyponymy distance relative to anchor class (<tabby cat>). The dynamics can be interpreted as top-to-bottom hierarchical label clustering
  • Figure 2: Mean of mutual- and self-covers of the features of penultimate ResNet-50 layer in the course of training: original feature space (a) and UMAP embeddings space (b)
  • Figure 3: Relative accuracy gain during training of ResNet-50.
  • Figure 4: 95% accuracy convergence period (in epochs) for different number of hypernyms
  • Figure 5: Estimation of NC1, NC3 and NC4 properties of neural collapse in different label spaces
  • ...and 13 more figures